Extract column value based on another column in Pandas
Question:
I am kind of getting stuck on extracting value of one variable conditioning on another variable. For example, the following dataframe:
A B
p1 1
p1 2
p3 3
p2 4
How can I get the value of A
when B=3
? Every time when I extracted the value of A
, I got an object, not a string.
Answers:
Try:
df[df['B']==3]['A'].item()
assuming df
is your pandas.DataFrame
.
You can try query
, which is less typing:
df.query('B==3')['A']
Use df[df['B']==3]['A'].values[0]
if you just want item itself without the brackets
df.loc[df['B']=='give-specific-value', 'A']````
I have also worked on this clausing and extraction operations for my assignment.
Edited: What I described below under Previous is chained indexing and may not work in some situations. The best practice is to use loc, but the concept is the same:
df.loc[row, col]
row and col can be specified directly (e.g., ‘A’ or [‘A’, ‘B’]) or with a mask (e.g. df[‘B’] == 3). Using the example below:
df.loc[df['B'] == 3, 'A']
Previous: It’s easier for me to think in these terms, but borrowing from other answers. The value you want is located in a dataframe:
df[*column*][*row*]
where column and row point to the values you want returned. For your example, column is ‘A’ and for row you use a mask:
df['B'] == 3
To get the first matched value from the series there are several options:
df['A'][df['B'] == 3].values[0]
df['A'][df['B'] == 3].iloc[0]
df['A'][df['B'] == 3].to_numpy()[0]
You can use squeeze
instead of iloc[0]
. It looks clearer if you have only one value:
df.loc[df['B'] == 3, 'A'].squeeze()
Output:
'p3'
I am kind of getting stuck on extracting value of one variable conditioning on another variable. For example, the following dataframe:
A B
p1 1
p1 2
p3 3
p2 4
How can I get the value of A
when B=3
? Every time when I extracted the value of A
, I got an object, not a string.
Try:
df[df['B']==3]['A'].item()
assuming df
is your pandas.DataFrame
.
You can try query
, which is less typing:
df.query('B==3')['A']
Use df[df['B']==3]['A'].values[0]
if you just want item itself without the brackets
df.loc[df['B']=='give-specific-value', 'A']````
I have also worked on this clausing and extraction operations for my assignment.
Edited: What I described below under Previous is chained indexing and may not work in some situations. The best practice is to use loc, but the concept is the same:
df.loc[row, col]
row and col can be specified directly (e.g., ‘A’ or [‘A’, ‘B’]) or with a mask (e.g. df[‘B’] == 3). Using the example below:
df.loc[df['B'] == 3, 'A']
Previous: It’s easier for me to think in these terms, but borrowing from other answers. The value you want is located in a dataframe:
df[*column*][*row*]
where column and row point to the values you want returned. For your example, column is ‘A’ and for row you use a mask:
df['B'] == 3
To get the first matched value from the series there are several options:
df['A'][df['B'] == 3].values[0]
df['A'][df['B'] == 3].iloc[0]
df['A'][df['B'] == 3].to_numpy()[0]
You can use squeeze
instead of iloc[0]
. It looks clearer if you have only one value:
df.loc[df['B'] == 3, 'A'].squeeze()
Output:
'p3'