Dynamically query pandas df for values of column with multiple conditions on other column being True
Question:
I have a dataframe looking like this:
Animal
Zoo
Lion
Berlin
Lion
Munich
Lion
Paris
Monkey
Berlin
Monkey
Munich
Monkey
Rotterdam
Bat
Berlin
Goose
Rotterdam
Tiger
Paris
Tiger
Munich
I am looking for a way to dynamically build a query which returns the unique values of animals which appear in a specific set of zoos, e.g. all animals which are located in the zoos in Berlin AND Munich.
The result should look like this:
result = ['Lion', 'Monkey']
So far I tried this
# # initialize list of lists
data = [
['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
['Bat', 'Berlin'],
['Goose', 'Rotterdam'],
['Tiger', 'Paris'], ['Tiger', 'Munich']
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Animal', 'Zoo'])
# filter df
df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")
# get animals as list
result = df_filtered['Animal'].unique().tolist()
# print list of results
print(result)
which gives me the animals which appear in Berlin OR Munich.
['Lion', 'Monkey', 'Bat', 'Tiger']
Turning the OR into an AND statement leads to an empty dataframe:
df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")
Answers:
Use groubby:
result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()
Output:
[‘Lion’, ‘Monkey’]
To get animals that are present in both zoos you would need to filter the initial dataset by those zoos and filter groups of animals by matching 2 required zoos:
zoos = {'Berlin', 'Munich'}
df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)
Animal Zoo
0 Lion Berlin
1 Lion Munich
3 Monkey Berlin
4 Monkey Munich
I have a dataframe looking like this:
Animal | Zoo |
---|---|
Lion | Berlin |
Lion | Munich |
Lion | Paris |
Monkey | Berlin |
Monkey | Munich |
Monkey | Rotterdam |
Bat | Berlin |
Goose | Rotterdam |
Tiger | Paris |
Tiger | Munich |
I am looking for a way to dynamically build a query which returns the unique values of animals which appear in a specific set of zoos, e.g. all animals which are located in the zoos in Berlin AND Munich.
The result should look like this:
result = ['Lion', 'Monkey']
So far I tried this
# # initialize list of lists
data = [
['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
['Bat', 'Berlin'],
['Goose', 'Rotterdam'],
['Tiger', 'Paris'], ['Tiger', 'Munich']
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Animal', 'Zoo'])
# filter df
df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")
# get animals as list
result = df_filtered['Animal'].unique().tolist()
# print list of results
print(result)
which gives me the animals which appear in Berlin OR Munich.
['Lion', 'Monkey', 'Bat', 'Tiger']
Turning the OR into an AND statement leads to an empty dataframe:
df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")
Use groubby:
result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()
Output:
[‘Lion’, ‘Monkey’]
To get animals that are present in both zoos you would need to filter the initial dataset by those zoos and filter groups of animals by matching 2 required zoos:
zoos = {'Berlin', 'Munich'}
df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)
Animal Zoo
0 Lion Berlin
1 Lion Munich
3 Monkey Berlin
4 Monkey Munich