If string is name of one of column, take rows including this string, if not remove row with python
Question:
I have two files. First one looks like this:
A
B
C
D
…
0.002
0.25
1.25
2.45
3.54
0.15
0.45
2.35
0.48
4.57
…
…
…
…
…
and the second one :
letter
ID
A
12
T
12
C
23
D, J
15
K
45
M, A, C
35
If one of column name of first file does not appear in the second file then remove it from second file, like this:
letter
ID
A
12
C
23
D
15
A, C
35
Thanks.
Answers:
You could use loc
to filter your df2 based on df1’s columns:
>>> df2.loc[df2.letter.isin(list(df1))]
letter ID
0 A 12
2 C 23
list(df1)
will return df1’s columns as a list which can be passed into isin
which is used on the ‘letter’ column in df2.
Given your edits, you can use the following approach:
df2_exp = df2.assign(g=df2.groupby('letter').ngroup(),
letter=df2.letter.str.split(',')
).explode('letter')
df2_exp.loc[
df2_exp.letter.str.strip().isin(list(df1)
)
].groupby('g').agg({'ID':'first','letter':lambda letter: ','.join(letter)
}).reset_index(drop=True)
prints:
ID letter
0 12 A
1 23 C
2 15 D
3 35 A, C
I have two files. First one looks like this:
A | B | C | D | … |
---|---|---|---|---|
0.002 | 0.25 | 1.25 | 2.45 | 3.54 |
0.15 | 0.45 | 2.35 | 0.48 | 4.57 |
… | … | … | … | … |
and the second one :
letter | ID |
---|---|
A | 12 |
T | 12 |
C | 23 |
D, J | 15 |
K | 45 |
M, A, C | 35 |
If one of column name of first file does not appear in the second file then remove it from second file, like this:
letter | ID |
---|---|
A | 12 |
C | 23 |
D | 15 |
A, C | 35 |
Thanks.
You could use loc
to filter your df2 based on df1’s columns:
>>> df2.loc[df2.letter.isin(list(df1))]
letter ID
0 A 12
2 C 23
list(df1)
will return df1’s columns as a list which can be passed into isin
which is used on the ‘letter’ column in df2.
Given your edits, you can use the following approach:
df2_exp = df2.assign(g=df2.groupby('letter').ngroup(),
letter=df2.letter.str.split(',')
).explode('letter')
df2_exp.loc[
df2_exp.letter.str.strip().isin(list(df1)
)
].groupby('g').agg({'ID':'first','letter':lambda letter: ','.join(letter)
}).reset_index(drop=True)
prints:
ID letter
0 12 A
1 23 C
2 15 D
3 35 A, C