If string is name of one of column, take rows including this string, if not remove row with python

Question:

I have two files. First one looks like this:

A B C D
0.002 0.25 1.25 2.45 3.54
0.15 0.45 2.35 0.48 4.57

and the second one :

letter ID
A 12
T 12
C 23
D, J 15
K 45
M, A, C 35

If one of column name of first file does not appear in the second file then remove it from second file, like this:

letter ID
A 12
C 23
D 15
A, C 35

Thanks.

Asked By: abcabc

||

Answers:

You could use loc to filter your df2 based on df1’s columns:

>>> df2.loc[df2.letter.isin(list(df1))]

  letter  ID
0      A  12
2      C  23

list(df1) will return df1’s columns as a list which can be passed into isin which is used on the ‘letter’ column in df2.


Given your edits, you can use the following approach:

df2_exp = df2.assign(g=df2.groupby('letter').ngroup(),
           letter=df2.letter.str.split(',')
           ).explode('letter')

df2_exp.loc[
    df2_exp.letter.str.strip().isin(list(df1)
                                    )
    ].groupby('g').agg({'ID':'first','letter':lambda letter: ','.join(letter)
                        }).reset_index(drop=True)

prints:

   ID letter
0  12      A
1  23      C
2  15      D
3  35   A, C
Answered By: sophocles
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.