Python pandas drop columns if their partial name is in a list or column in pandas

Question:

I have the following dataframe called dropthese.

     | partname        | x1 | x2 | x3....
0      text1_mid1
1      another1_mid2
2      yet_another

And another dataframe called df that looks like this.

     text1_mid1_suffix1 | text1_mid1_suffix2 | ... | something_else | another1_mid2_suffix1 | ....
0       .....
1       .....
2       .....
3       .....

I want to drop all the columns from df, if a part of the name is in dropthese['partname'].

So for example, since text1_mid1 is in partname, all columns that contain that partial string should be dropped like text1_mid1_suffix1 and text1_mid1_suffix2.

I have tried,

thisFilter = df.filter(dropthese.partname, regex=True)
df.drop(thisFilter, axis=1)

But I get this error, TypeError: Keyword arguments `items`, `like`, or `regex` are mutually exclusive. What is the proper way to do this filter?

Asked By: anarchy

||

Answers:

I would use a regex with str.contains (or str.match if you want to restrict to the start of string):

import re
pattern = '|'.join(dropthese['partname'].map(re.escape))

out = df.loc[:, ~df.columns.str.contains(f'({pattern})')]

Output:

   something_else
0             ...

Why your command failed

you should pass the pattern to the regex parameter of filter, and use the column names in drop:

pattern = '|'.join(dropthese['partname'].map(re.escape))
thisFilter = df.filter(regex=pattern)
df.drop(thisFilter.columns, axis=1)
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.