Python's regex star quantifier not working as expected
Question:
I’m trying to use regular expressions to select only groups of words within quotation marks.
Example.
Input:
this is 'a sentence' with less 'than twenty words'
Output:
['a sentence', 'than twenty words']
The regex I’m using is:
''[w]+[ ]+[[w]+[ ]+]*[w]+''
But it’s just returning the ‘than twenty words’. In fact, it only returns the strings with two spaces.
Answers:
import re
sentence = "this is 'a sentence' with less 'than twenty words' and a 'lonely' word"
regex = re.compile(r"(?<=')w+(?:s+w+)+(?=')")
regex.findall(sentence)
# ['a sentence', 'than twenty words']
We want to capture strings starting and ending with quotes, without capturing them, so we use a positive lookbehind assertion (?<=')
before, and a lookahead assertion (?=')
afterwards.
Inside the quotes, we want to have at least one word, followed by at least one group of space and word. We don’t want it to be a capturing group, otherwise findall
would return only this group, so we make it non-catching by using (?:....)
.
This will deliver the strings between quotation marks, including words and spaces.
import re
st = "this is 'a sentence' with less 'than twenty words'"
re.findall(r"'([w|s]+)'", st)
Late answer, but you can use:
import re
string = "this is 'a sentence' with less 'than twenty words'"
result = re.findall("'(.*?)'", string)
print result
# ['a sentence', 'than twenty words']
I’m trying to use regular expressions to select only groups of words within quotation marks.
Example.
Input:
this is 'a sentence' with less 'than twenty words'
Output:
['a sentence', 'than twenty words']
The regex I’m using is:
''[w]+[ ]+[[w]+[ ]+]*[w]+''
But it’s just returning the ‘than twenty words’. In fact, it only returns the strings with two spaces.
import re
sentence = "this is 'a sentence' with less 'than twenty words' and a 'lonely' word"
regex = re.compile(r"(?<=')w+(?:s+w+)+(?=')")
regex.findall(sentence)
# ['a sentence', 'than twenty words']
We want to capture strings starting and ending with quotes, without capturing them, so we use a positive lookbehind assertion (?<=')
before, and a lookahead assertion (?=')
afterwards.
Inside the quotes, we want to have at least one word, followed by at least one group of space and word. We don’t want it to be a capturing group, otherwise findall
would return only this group, so we make it non-catching by using (?:....)
.
This will deliver the strings between quotation marks, including words and spaces.
import re
st = "this is 'a sentence' with less 'than twenty words'"
re.findall(r"'([w|s]+)'", st)
Late answer, but you can use:
import re
string = "this is 'a sentence' with less 'than twenty words'"
result = re.findall("'(.*?)'", string)
print result
# ['a sentence', 'than twenty words']