How to get column index which is matching with specific value in Pandas?
Question:
I have the following dataframe as below.
0 1 2 3 4 5 6 7
True False False False False False False False
[1 rows * 8 columns]
As you can see, there is one True
value which is the first column.
Therefore, I want to get the 0
index which is True
element in the dataframe.
In other case, there is True
in the 4th column index, then I would like to get the 4
as 4th column has the True
value for below dataframe.
0 1 2 3 4 5 6 7
False False False False True False False False
[1 rows * 8 columns]
I tried to google it but failed to get what I want.
And for assumption, there is no designated column name in the case.
Look forward to your help.
Thanks.
Answers:
IIUC, you are looking for idxmax
:
>>> df
0 1 2 3 4 5 6 7
0 True False False False False False False False
>>> df.idxmax(axis=1)
0 0
dtype: object
>>> df
0 1 2 3 4 5 6 7
0 False False False False True False False False
>>> df.idxmax(axis=1)
0 4
dtype: object
Caveat: if all values are False
, Pandas returns the first index because index 0 is the lowest index of the highest value:
>>> df
0 1 2 3 4 5 6 7
0 False False False False False False False False
>>> df.idxmax(axis=1)
0 0
dtype: object
Workaround: replace False
by np.nan
:
>>> df.replace(False, np.nan).idxmax(axis=1)
0 NaN
dtype: float64
if you want every field that is true:
cols_true = []
for idx, row in df.iterrows():
for i in cols:
if row[i]:
cols_true.append(i)
print(cols_true)
Use boolean indexing:
df.columns[df.iloc[0]]
output:
Index(['0'], dtype='object')
Or numpy.where
np.where(df)[1]
You may want to index the dataframe’s index by a column itself (0
in this case), as follows:
df.index[df[0]]
You’ll get:
Int64Index([0], dtype='int64')
df.loc[:, df.any()].columns[0]
# 4
If you have several True
values you can also get them all with columns
Generalization
Imagine we have the following dataframe (several True values in positions 4, 6 and 7):
0 1 2 3 4 5 6 7
0 False False False False True False True True
With the formula above :
df.loc[:, df.any()].columns
# Int64Index([4, 6, 7], dtype='int64')
df1.apply(lambda ss:ss.loc[ss].index.min(),axis=1).squeeze()
out:
0
or
df1.loc[:,df1.iloc[0]].columns.min()
I have the following dataframe as below.
0 1 2 3 4 5 6 7
True False False False False False False False
[1 rows * 8 columns]
As you can see, there is one True
value which is the first column.
Therefore, I want to get the 0
index which is True
element in the dataframe.
In other case, there is True
in the 4th column index, then I would like to get the 4
as 4th column has the True
value for below dataframe.
0 1 2 3 4 5 6 7
False False False False True False False False
[1 rows * 8 columns]
I tried to google it but failed to get what I want.
And for assumption, there is no designated column name in the case.
Look forward to your help.
Thanks.
IIUC, you are looking for idxmax
:
>>> df
0 1 2 3 4 5 6 7
0 True False False False False False False False
>>> df.idxmax(axis=1)
0 0
dtype: object
>>> df
0 1 2 3 4 5 6 7
0 False False False False True False False False
>>> df.idxmax(axis=1)
0 4
dtype: object
Caveat: if all values are False
, Pandas returns the first index because index 0 is the lowest index of the highest value:
>>> df
0 1 2 3 4 5 6 7
0 False False False False False False False False
>>> df.idxmax(axis=1)
0 0
dtype: object
Workaround: replace False
by np.nan
:
>>> df.replace(False, np.nan).idxmax(axis=1)
0 NaN
dtype: float64
if you want every field that is true:
cols_true = []
for idx, row in df.iterrows():
for i in cols:
if row[i]:
cols_true.append(i)
print(cols_true)
Use boolean indexing:
df.columns[df.iloc[0]]
output:
Index(['0'], dtype='object')
Or numpy.where
np.where(df)[1]
You may want to index the dataframe’s index by a column itself (0
in this case), as follows:
df.index[df[0]]
You’ll get:
Int64Index([0], dtype='int64')
df.loc[:, df.any()].columns[0]
# 4
If you have several True
values you can also get them all with columns
Generalization
Imagine we have the following dataframe (several True values in positions 4, 6 and 7):
0 1 2 3 4 5 6 7
0 False False False False True False True True
With the formula above :
df.loc[:, df.any()].columns
# Int64Index([4, 6, 7], dtype='int64')
df1.apply(lambda ss:ss.loc[ss].index.min(),axis=1).squeeze()
out:
0
or
df1.loc[:,df1.iloc[0]].columns.min()