Match on two items with regular expressions

Question:

I am trying to do a filter on my Pandas data frame that contains columns with names such as DE-NL, DE-FR, FR-NL, AT-DE etc.

I need to filter my data frame so I only get the columns that contains two of a certain strings.

So basically, if I just do this:

df = df.filter(regex="DE|NL|FR")

Then it finds all columns where either of the three are in. So I might also get something like AT-DE. In this case I would actually only need the following columns:

DE-NL, NL-DE, NL-FR, FR-NL, DE-FR, FR-DE

So two of the filter strings needs to be present before it’s a match.

Asked By: Denver Dang

||

Answers:

Use:

df = df.filter(regex="^(?:DE|NL|FR)-(?:DE|NL|FR)$")

To make it programmatically from a list of country codes:

l = ['DE', 'NL', 'FR']
s = '|'.join(l)

df = df.filter(regex=f"^(?:{s})-(?:{s})$")

If you want to avoid matching repeated values (e.g., ‘NL-NL’):

l = ['DE', 'NL', 'FR']
s = '|'.join(l)

df = df.filter(regex=fr"^({s})-(?!1)(?:{s})$")
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.