How to check one rows value present in any of the other column row value
Question:
I have excel like below
A B
1A 100
2A 1A
3A 101
5A 1A
Expected out is
A B Bool
1A 100
2A 1A True
3A 101
5A 1A True
Here you can see that 1A
present in df[‘A’] is present in df[‘B]
I tried like
import pandas as pd
df = pdf.read_excel('test.xlsx')
df['A'].isin(df['B'])
but its not working
Answers:
import pandas as pd
import numpy as np
data = {
'A':['1A','2A','3A','5A'],
'B':[100,'1A',101,'1A'],
'Bool': False
}
df = pd.DataFrame(data)
With this code, you will have the following dataframe:
A B Bool
0 1A 100 False
1 2A 1A False
2 3A 101 False
3 5A 1A False
You can correct the third column in this way by using the iterrows()
list_A = np.array(df['A'])
for index, row in df.iterrows():
if df['B'][index] in list_A:
df['Bool'][index] = True
The output is:
A B Bool
0 1A 100 False
1 2A 1A True
2 3A 101 False
3 5A 1A True
But this is not a good solution. Without iterrows() you can do as follows:
data = {
'A':['1A','2A','3A','5A'],
'B':[100,'1A',101,'1A']
}
df = pd.DataFrame(data)
df['bool'] = df['B'].isin(df['A'])
The output is the same:
A B bool
0 1A 100 False
1 2A 1A True
2 3A 101 False
3 5A 1A True
Why is the second way better than using iterrows()?
- SettingWithCopyWarning:
Usually, when you run iterrows() on the rows in pandas, and at the same time change a value in the dataframe, you get the following warning:
SettingWithCopyWarning:A value is trying to be set on a copy of a slice from a DataFrame
- RunTime:
df['B'].isin(df['A']) Time: 0.002069372000050862
df.iterrows() Time: 0.010589972999696329
I have excel like below
A B
1A 100
2A 1A
3A 101
5A 1A
Expected out is
A B Bool
1A 100
2A 1A True
3A 101
5A 1A True
Here you can see that 1A
present in df[‘A’] is present in df[‘B]
I tried like
import pandas as pd
df = pdf.read_excel('test.xlsx')
df['A'].isin(df['B'])
but its not working
import pandas as pd
import numpy as np
data = {
'A':['1A','2A','3A','5A'],
'B':[100,'1A',101,'1A'],
'Bool': False
}
df = pd.DataFrame(data)
With this code, you will have the following dataframe:
A B Bool
0 1A 100 False
1 2A 1A False
2 3A 101 False
3 5A 1A False
You can correct the third column in this way by using the iterrows()
list_A = np.array(df['A'])
for index, row in df.iterrows():
if df['B'][index] in list_A:
df['Bool'][index] = True
The output is:
A B Bool
0 1A 100 False
1 2A 1A True
2 3A 101 False
3 5A 1A True
But this is not a good solution. Without iterrows() you can do as follows:
data = {
'A':['1A','2A','3A','5A'],
'B':[100,'1A',101,'1A']
}
df = pd.DataFrame(data)
df['bool'] = df['B'].isin(df['A'])
The output is the same:
A B bool
0 1A 100 False
1 2A 1A True
2 3A 101 False
3 5A 1A True
Why is the second way better than using iterrows()?
- SettingWithCopyWarning:
Usually, when you run iterrows() on the rows in pandas, and at the same time change a value in the dataframe, you get the following warning:
SettingWithCopyWarning:A value is trying to be set on a copy of a slice from a DataFrame
- RunTime:
df['B'].isin(df['A']) Time: 0.002069372000050862
df.iterrows() Time: 0.010589972999696329