filter on value set intersection within a group
Question:
Let’s say I have a dataframe as follows:
Group | Source | Name
___________________________
A | X | Jolly
A | X | Stone
A | X | Jolly
A | Y | Sand
B | X | Sand
B | X | Stone
B | Y | Stone
C | X | Sand
C | X | Stone
I want to find all Group
s where each group of Source
s share no common Name
s. Essentially in the example above, I want Group
A
as all groupings of Source
(X
and Y
) Name
s share no common values. For this example we can assume there will only be 2 Source
s (X
and Y
) and not all Group
s have more than 1 Source
. I am only interested in Group
s with both Source
s X
and Y
and no Name
intersection.
The resulting DataFrame should look like this:
Group | Source | Name
___________________________
A | X | Jolly
A | X | Stone
A | X | Jolly
A | Y | Sand
I have tried, doing a grouby
on Group
then supplied a function to the chained filter
method like so:
def find_no_intersection(df):
return (
len(df[df.Source == 'X'].Name.values) > 0 and
len(df[df.Source == 'Y'].Name.values) > 0 and
(
len(
set(df[df.Source == 'X'].Name.values) &
set(df[df.Source == 'Y'].Name.values)
) == 0
)
)
df.groupby(['Group']).filter(find_no_intersection)
Is this the right way? Is there a better way?
Answers:
If I understand correctly, you can do that with the following.
df[~df['Group'].isin(df[df[['Source','Name']].duplicated()]['Group'])]
Here is a way using nunique()
df.loc[df.groupby('Group')['Name'].transform(lambda x: x.size == x.nunique())]
Output:
Group Source Name
0 A X Jolly
1 A X Stone
2 A Y Sand
Update to answer:
(df.loc[
df['Group'].map(
df.groupby(['Group','Source'])['Name']
.agg(set)
.groupby(level=0)
.agg(lambda x: len(set.intersection(*x))==0))
])
or:
m1 = df['Group'].map(df.groupby(['Group','Name'])['Source'].nunique().eq(1).groupby(level=0).all())
m2 = df.groupby('Group')['Source'].transform('nunique').eq(df['Source'].nunique())
df.loc[m1 & m2]
Output:
Group Source Name
0 A X Jolly
1 A X Stone
2 A X Jolly
3 A Y Sand
Let’s say I have a dataframe as follows:
Group | Source | Name
___________________________
A | X | Jolly
A | X | Stone
A | X | Jolly
A | Y | Sand
B | X | Sand
B | X | Stone
B | Y | Stone
C | X | Sand
C | X | Stone
I want to find all Group
s where each group of Source
s share no common Name
s. Essentially in the example above, I want Group
A
as all groupings of Source
(X
and Y
) Name
s share no common values. For this example we can assume there will only be 2 Source
s (X
and Y
) and not all Group
s have more than 1 Source
. I am only interested in Group
s with both Source
s X
and Y
and no Name
intersection.
The resulting DataFrame should look like this:
Group | Source | Name
___________________________
A | X | Jolly
A | X | Stone
A | X | Jolly
A | Y | Sand
I have tried, doing a grouby
on Group
then supplied a function to the chained filter
method like so:
def find_no_intersection(df):
return (
len(df[df.Source == 'X'].Name.values) > 0 and
len(df[df.Source == 'Y'].Name.values) > 0 and
(
len(
set(df[df.Source == 'X'].Name.values) &
set(df[df.Source == 'Y'].Name.values)
) == 0
)
)
df.groupby(['Group']).filter(find_no_intersection)
Is this the right way? Is there a better way?
If I understand correctly, you can do that with the following.
df[~df['Group'].isin(df[df[['Source','Name']].duplicated()]['Group'])]
Here is a way using nunique()
df.loc[df.groupby('Group')['Name'].transform(lambda x: x.size == x.nunique())]
Output:
Group Source Name
0 A X Jolly
1 A X Stone
2 A Y Sand
Update to answer:
(df.loc[
df['Group'].map(
df.groupby(['Group','Source'])['Name']
.agg(set)
.groupby(level=0)
.agg(lambda x: len(set.intersection(*x))==0))
])
or:
m1 = df['Group'].map(df.groupby(['Group','Name'])['Source'].nunique().eq(1).groupby(level=0).all())
m2 = df.groupby('Group')['Source'].transform('nunique').eq(df['Source'].nunique())
df.loc[m1 & m2]
Output:
Group Source Name
0 A X Jolly
1 A X Stone
2 A X Jolly
3 A Y Sand