If string is name of one of column, take rows including this string, if not remove row with python

Question

I have two files. First one looks like this:

A	B	C	D	…
0.002	0.25	1.25	2.45	3.54
0.15	0.45	2.35	0.48	4.57
…	…	…	…	…

and the second one :

letter	ID
A	12
T	12
C	23
D, J	15
K	45
M, A, C	35

If one of column name of first file does not appear in the second file then remove it from second file, like this:

letter	ID
A	12
C	23
D	15
A, C	35

Thanks.

Asked By: abcabc

||

Source

Answer 1

You could use loc to filter your df2 based on df1’s columns:

>>> df2.loc[df2.letter.isin(list(df1))]

  letter  ID
0      A  12
2      C  23

list(df1) will return df1’s columns as a list which can be passed into isin which is used on the ‘letter’ column in df2.

Given your edits, you can use the following approach:

df2_exp = df2.assign(g=df2.groupby('letter').ngroup(),
           letter=df2.letter.str.split(',')
           ).explode('letter')

df2_exp.loc[
    df2_exp.letter.str.strip().isin(list(df1)
                                    )
    ].groupby('g').agg({'ID':'first','letter':lambda letter: ','.join(letter)
                        }).reset_index(drop=True)

prints:

   ID letter
0  12      A
1  23      C
2  15      D
3  35   A, C

Answered By: sophocles

If string is name of one of column, take rows including this string, if not remove row with python

Question:

Answers: