Find the unique rows in a file with python
Question:
I am trying to take rows which are uniques. Here unique means cells should not share any letter.
I have an excel file like this with thousands of rows:
id
letters
1
A,B,G
2
B,G
21
C,D
30
E
35
K,M
40
E,F
The values in letters column should not be contained in another letter cell.
The output should be like this, because the letters C,D,K and M don’t appeared in another cell:
id
letters
21
C,D
35
K,M
Answers:
You can split values by ,
, explode for possible remove all duplicates and join per groups to original joined values, last get rows with same data:
s = (df['letters'].str.split(',')
.explode()
.drop_duplicates(keep=False)
.groupby(level=0)
.agg(','.join))
df = df[df['letters'].eq(s)]
print (df)
id letters
2 21 C,D
4 35 K,M
I am trying to take rows which are uniques. Here unique means cells should not share any letter.
I have an excel file like this with thousands of rows:
id | letters |
---|---|
1 | A,B,G |
2 | B,G |
21 | C,D |
30 | E |
35 | K,M |
40 | E,F |
The values in letters column should not be contained in another letter cell.
The output should be like this, because the letters C,D,K and M don’t appeared in another cell:
id | letters |
---|---|
21 | C,D |
35 | K,M |
You can split values by ,
, explode for possible remove all duplicates and join per groups to original joined values, last get rows with same data:
s = (df['letters'].str.split(',')
.explode()
.drop_duplicates(keep=False)
.groupby(level=0)
.agg(','.join))
df = df[df['letters'].eq(s)]
print (df)
id letters
2 21 C,D
4 35 K,M