Dynamically query pandas df for values of column with multiple conditions on other column being True

Question:

I have a dataframe looking like this:

Animal Zoo
Lion Berlin
Lion Munich
Lion Paris
Monkey Berlin
Monkey Munich
Monkey Rotterdam
Bat Berlin
Goose Rotterdam
Tiger Paris
Tiger Munich

I am looking for a way to dynamically build a query which returns the unique values of animals which appear in a specific set of zoos, e.g. all animals which are located in the zoos in Berlin AND Munich.

The result should look like this:

result = ['Lion', 'Monkey']

So far I tried this

# # initialize list of lists
data = [
        ['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
        ['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
        ['Bat', 'Berlin'],
        ['Goose', 'Rotterdam'],
        ['Tiger', 'Paris'], ['Tiger', 'Munich']
]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Animal', 'Zoo'])

# filter df
df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")

# get animals as list
result = df_filtered['Animal'].unique().tolist()

# print list of results
print(result)

which gives me the animals which appear in Berlin OR Munich.

['Lion', 'Monkey', 'Bat', 'Tiger']

Turning the OR into an AND statement leads to an empty dataframe:

df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")
Asked By: Maxhlnug2021

||

Answers:

Use groubby:

result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()

Output:

[‘Lion’, ‘Monkey’]

Answered By: Poder Psittacus

To get animals that are present in both zoos you would need to filter the initial dataset by those zoos and filter groups of animals by matching 2 required zoos:

zoos = {'Berlin', 'Munich'}
df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)

    Animal     Zoo
0    Lion   Berlin
1    Lion   Munich
3  Monkey   Berlin
4  Monkey   Munich
Answered By: RomanPerekhrest
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.