how can I select all columns of a dataframe, which partially match strings in a list?

Question:

Suppose I have a Dataframe like:

import pandas as pd

df = pd.DataFrame({'foo': [1, 2, 3], 'bar': [4, 5, 6], 'ber': [7, 8, 9]})

Given a list of "filter" strings like mylist = ['oo', 'ba'], how can I select all columns in df whose name partially match any of the strings in mylist? For this example, the expected output is {'foo': [1, 2, 3], 'bar': [4, 5, 6]}.

Asked By: DeltaIV

||

Answers:

You can use df.filter with regex to do that.

import pandas as pd

# sample dataframe
df = pd.DataFrame({'foo': [1, 2, 3], 'bar': [4, 5, 6], 'ber': [7, 8, 9]})

# sample list of strings
mylist = ['oo', 'ba']

# join the list to a single string
matches = '|'.join(mylist)

# use regex to filter the columns based on the string
df_out = df.filter(regex=matches)
Answered By: Tasos
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.