Cannot seem to figure out this regex involving forward slash

Question

I am trying to capture instances in my dataframe where a string has the following format:

/random a/random b/random c/capture this/random again/random/random

Where a string is preceded by four instances of /, and more than two / appear after it, I would like the string captured and returned in a different column. If it is not applicable to that row, return None.

In this instance capture this should be captured and placed into a new column.

This is what I tried:

def extract_special_string(df, column):
    df['special_string_a'] = df[column].apply(lambda x: re.search(r'(?<=/{4})[^/]+(?=/[^/]{2,})', x).group(0) if re.search(r'(?<=/{4})[^/]+(?=/[^/]{2,})', x) else None)

extract_special_string(df, 'column')

However nothing is being captured. Can anybody help with this regex? Thanks.

Asked By: work_python

||

Source

Answer 1

You can use

df['special_string_a'] = df[column].str.extract(r'^(?:[^/]*/){4}([^/]+)(?:/[^/]*){2}', expand=False)

See the regex demo

Details:

^ – start of string
(?:[^/]*/){4} – four occurrences of any zero or more chars other than / and then a / char
([^/]+) – Capturing group 1:one or more chars other than a / char
(?:/[^/]*){2} – two occurrences of a / char and then any zero or more chars other than /.

Answered By: Wiktor Stribiżew

Answer 2

An alternative regex approach would be to use non-greedy quantifiers.

import re
s = '/random a/random b/random c/capture this/random again/random/random'
pattern = r'/(?:.*?/){3}(.*?)(?:/.*?){2,}'
m = re.match(pattern, s)
print(m.group(1))  # 'capture this'

/(?:.*?/){3} – match the part before capture this, matching any but as few as possible characters between each pair of /s (use noncapturing group to ignore the contents)
(.*?) – capture capture this (since this is a capturing group, we can fetch the contents from <match_object>.group(1)
(?:/.*?){2,} – same as the first part, match as few characters as possible in between each pair of /s

Answered By: Fractalism

Cannot seem to figure out this regex involving forward slash

Question:

Answers: