I am trying to find all substrings within a multi string in python 3, I want to find all words in between the word ‘Colour:’:
str = """ Colour: Black Colour: Green Colour: Black Colour: Red Colour: Orange Colour: Blue Colour: Green """
I want to get all of the colours into a list like:
x = ['Black', 'Green', 'Black', 'Red', 'Orange', 'Blue', 'Green']
I want to do this using Python re
Whats the fastest way of doing this with re.search , re.findall, re.finditer or even another method.
I’ve tried doing this as a list comprehension:
z = [x.group() for x in re.finditer('Colour:(.*?)Colour:', str)]
but it returns an empty list ?
In regex, the dot
. does not match new line by default. This mean your program is trying to find something like "Color: blueColor".
To overcome this, you can just do something like :
colours = re.findall(r'Colour: (.+)', str)
Note the use of
re.findall to avoid using the list comprehension.
Furthermore, if the format won’t change, regex is not mandatory and you can just split each line on spaces and get the second part :
colours = [line.split() for line in str.splitlines()]
The lists containing the trailing spaces can be removed and split based on the user-defined variable. In your case, the
list(filter(None, str.replace("n", "").replace(" ", "").split("Colour:")))
['Black', 'Green', 'Black', 'Red', 'Orange', 'Blue', 'Green']
Regex patterns are subjected to taking more time than dealing with strings directly.
Perhaps you just need a simple one-liner:
x = re.findall("Colour: (.*)",str)
This worked for your example.
(P.S. please don’t use builtin symbols like
str for variable names.)