Add an additional column to a panda dataframe comparing two columns
Question:
I have a dataframe (df) containing two columns:
Column 1
Column 2
Apple
Banana
Chicken
Chicken
Dragonfruit
Egg
Fish
Fish
What I want to do is create a third column that says whether the results in each column are the same. For instance:
Column 1
Column 2
Same
Apple
Banana
No
Chicken
Chicken
Yes
Dragonfruit
Egg
No
Fish
Fish
Yes
I’ve tried:
df['Same'] = df.apply(lambda row: row['Column A'] in row['Column B'],axis=1)
Which didn’t work.
I also tried to create a for
loop but couldn’t even get close to it working.
Any help you can provide would be much appreciated!
Answers:
You can simply use np.where
:
import numpy as np
df['Same'] = np.where(df['Column 1'] == df['Column 2'], 'Yes', 'No')
>>> print(df)
In Pandas the ==
operator between Series
returns a new Series
So you can use:
df['Same'] = df['Column A'] == df['Column B']
df['Same'] = df['Same'].replace(True, 'Yes').replace(False, 'No')
Use isin
:
df_eq['Column1'].isin(df_eq['Column2']).replace({True:'Yes', False:'No'})
You could use the .apply(lambda)
expression
df['Same'] = df.apply(lambda x: 'Yes' if x['Column 1']==x['Column 2'] else 'No', axis=1)
I have a dataframe (df) containing two columns:
Column 1 | Column 2 |
---|---|
Apple | Banana |
Chicken | Chicken |
Dragonfruit | Egg |
Fish | Fish |
What I want to do is create a third column that says whether the results in each column are the same. For instance:
Column 1 | Column 2 | Same |
---|---|---|
Apple | Banana | No |
Chicken | Chicken | Yes |
Dragonfruit | Egg | No |
Fish | Fish | Yes |
I’ve tried:
df['Same'] = df.apply(lambda row: row['Column A'] in row['Column B'],axis=1)
Which didn’t work.
I also tried to create a for
loop but couldn’t even get close to it working.
Any help you can provide would be much appreciated!
You can simply use np.where
:
import numpy as np
df['Same'] = np.where(df['Column 1'] == df['Column 2'], 'Yes', 'No')
>>> print(df)
In Pandas the ==
operator between Series
returns a new Series
So you can use:
df['Same'] = df['Column A'] == df['Column B']
df['Same'] = df['Same'].replace(True, 'Yes').replace(False, 'No')
Use isin
:
df_eq['Column1'].isin(df_eq['Column2']).replace({True:'Yes', False:'No'})
You could use the .apply(lambda)
expression
df['Same'] = df.apply(lambda x: 'Yes' if x['Column 1']==x['Column 2'] else 'No', axis=1)