Selecting rows with logic involving multiple variables across rows

Question:

I have a data frame like this:

df = pd.DataFrame({"product": [1,2,3,4,5], "company": ["A", "B","B","A","B"], "state": ["CA", "NY", "CA", "CA", "NY"]})

company state   product
0   A   CA      1
1   B   NY      2
2   B   CA      3
3   A   CA      4
4   B   NY      5

I would like a bool that picks out just the rows that correspond to states which have only one company in them. In this case that would be only NY which has only company B, so the desired bool would be [False, True, False, False, True]

Alternatively, I would like to know the set of states that have only one company in them. I guess I could do that e.g. using value_counts once I have the bool.

How do I do this?

Asked By: Smithey

||

Answers:

You can use groupby_nunique with transform to broadcast the result over rows then just check if the result is equal to 1:

df['flag'] = df.groupby('state')['company'].transform('nunique').eq(1)
print(df)

# Output
   product company state   flag
0        1       A    CA  False
1        2       B    NY   True
2        3       B    CA  False
3        4       A    CA  False
4        5       B    NY   True
Answered By: Corralien
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.