List of Python Dataframe column values meeting criteria in another dataframe?
Question:
I have two dataframes, df1 and df2, which have a common column heading, Name. The Name values are unique within df1 and df2. df1’s Name values are a subset of those in df2; df2 has more rows — about 17,300 — than df1 — about 6,900 — but each Name value in df1 is in df2. I would like to create a list of Name values in df1 that meet certain criteria in other columns of the corresponding rows in df2.
Example:
df1:
Name
Age
Hair
0
Jim
25
black
1
Mary
58
brown
3
Sue
15
purple
df2:
Name
Country
phoneOS
0
Shari
GB
Android
1
Jim
US
Android
2
Alain
TZ
iOS
3
Sue
PE
iOS
4
Mary
US
Android
I would like a list of only those Name values in df1 that have df2 Country and phoneOS
values of US and Android. The example result would be [Jim, Mary].
I have successfully selected rows within one dataframe that meet multiple criteria in order to copy those rows to a new dataframe. In that case pandas/Python does the iteration over rows internally. I guess I could write a "manual" iteration over the rows of df1 and access df2 on each iteration. I was hoping for a more efficient solution whereby the iteration was handled internally as in the single-dataframe case. But my searches for such a solution have been fruitless.
Answers:
try:
df_1.loc[df_1.Name.isin(df_2.loc[df_2.Country.eq('US') &
df_2.phoneOS.eq('Android'), 'Name']), 'Name']
Result:
0 Jim
1 Mary
Name: Name, dtype: object
if you want the result as a list just add .to_list()
at the end
data = df1.merge(df2, on='Name')
data.loc[((data.phoneOS == 'Android') & (data.Country == "US")), 'Name'].values.tolist()
I have two dataframes, df1 and df2, which have a common column heading, Name. The Name values are unique within df1 and df2. df1’s Name values are a subset of those in df2; df2 has more rows — about 17,300 — than df1 — about 6,900 — but each Name value in df1 is in df2. I would like to create a list of Name values in df1 that meet certain criteria in other columns of the corresponding rows in df2.
Example:
df1:
Name | Age | Hair | |
---|---|---|---|
0 | Jim | 25 | black |
1 | Mary | 58 | brown |
3 | Sue | 15 | purple |
df2:
Name | Country | phoneOS | |
---|---|---|---|
0 | Shari | GB | Android |
1 | Jim | US | Android |
2 | Alain | TZ | iOS |
3 | Sue | PE | iOS |
4 | Mary | US | Android |
I would like a list of only those Name values in df1 that have df2 Country and phoneOS
values of US and Android. The example result would be [Jim, Mary].
I have successfully selected rows within one dataframe that meet multiple criteria in order to copy those rows to a new dataframe. In that case pandas/Python does the iteration over rows internally. I guess I could write a "manual" iteration over the rows of df1 and access df2 on each iteration. I was hoping for a more efficient solution whereby the iteration was handled internally as in the single-dataframe case. But my searches for such a solution have been fruitless.
try:
df_1.loc[df_1.Name.isin(df_2.loc[df_2.Country.eq('US') &
df_2.phoneOS.eq('Android'), 'Name']), 'Name']
Result:
0 Jim
1 Mary
Name: Name, dtype: object
if you want the result as a list just add .to_list()
at the end
data = df1.merge(df2, on='Name')
data.loc[((data.phoneOS == 'Android') & (data.Country == "US")), 'Name'].values.tolist()