Extract with multiple Patterns

Question:

Having an issue that maybe some help me with. I am trying to extract two patterns from a string and place them in another column. It’s extracting the first string fine but I am missing some in getting the second one there. Here’s the string.

jobseries['New Column'] = jobseries['Occupation'].str.extract('(GS-d+)(|)(WG-d+)').fillna('')

The first string is (GS-d+) and the second string is (WG-d+)

I’ve tried a ton of variations none have worked.

Asked By: Jeremiah Anderson

||

Answers:

You can use either

jobseries['New Column'] = jobseries['Occupation'].str.extract(r'(GS-d+|WG-d+)').fillna('')

or a shorter

jobseries['New Column'] = jobseries['Occupation'].str.extract(r'((?:GS|WG-d+)').fillna('')

The points are:

  • There must be only one capturing group in the regex since you are using Series.str.extract and assignt he result to a single column (New Column)
  • The regex must match either one string or the other, but you can factor in the beginning of the pattern and simply use ((?:GS|WG-d+) instead of (GS-d+|WG-d+), that means a capturing group that matches either GS or WG and then a hyphen and then one or more digits.
Answered By: Wiktor Stribiżew
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.