find pattern substring using python re

Question:

I am trying to find all substrings within a multi string in python 3, I want to find all words in between the word ‘Colour:’:

example string:

str = """
Colour: Black
Colour: Green
Colour: Black
Colour: Red
Colour: Orange
Colour: Blue
Colour: Green
"""

I want to get all of the colours into a list like:

x = ['Black', 'Green', 'Black', 'Red', 'Orange', 'Blue', 'Green']

I want to do this using Python re

Whats the fastest way of doing this with re.search , re.findall, re.finditer or even another method.

I’ve tried doing this as a list comprehension:

z = [x.group() for x in re.finditer('Colour:(.*?)Colour:', str)]

but it returns an empty list ?

any ideas?

Asked By: milo2810

||

Answers:

In regex, the dot . does not match new line by default. This mean your program is trying to find something like "Color: blueColor".

To overcome this, you can just do something like :

colours = re.findall(r'Colour: (.+)', str)

Note the use of re.findall to avoid using the list comprehension.

Furthermore, if the format won’t change, regex is not mandatory and you can just split each line on spaces and get the second part :

colours = [line.split()[1] for line in str.splitlines()]
Answered By: charon25

The lists containing the trailing spaces can be removed and split based on the user-defined variable. In your case, the Colour:.

list(filter(None, str.replace("n", "").replace(" ", "").split("Colour:")))

Result:

['Black', 'Green', 'Black', 'Red', 'Orange', 'Blue', 'Green']
Regard to time constraints:

Regex patterns are subjected to taking more time than dealing with strings directly.

Adding the image for reference:
enter image description here

Answered By: Roxy

Perhaps you just need a simple one-liner:

x = re.findall("Colour: (.*)",str)

This worked for your example.

(P.S. please don’t use builtin symbols like str for variable names.)

Answered By: RufusVS
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.