Python's regex star quantifier not working as expected

Question:

I’m trying to use regular expressions to select only groups of words within quotation marks.

Example.

Input:

this is 'a sentence' with less 'than twenty words'

Output:

['a sentence', 'than twenty words']

The regex I’m using is:

''[w]+[ ]+[[w]+[ ]+]*[w]+''

But it’s just returning the ‘than twenty words’. In fact, it only returns the strings with two spaces.

Asked By: Claudia

||

Answers:

Try this:

import re
re.findall(r"'(s*w+s+w[sw]*)'", input_string)

Demo

Answered By: Ahsanul Haque
import re 
sentence = "this is 'a sentence' with less 'than twenty words' and a 'lonely' word"
regex = re.compile(r"(?<=')w+(?:s+w+)+(?=')")
regex.findall(sentence)
# ['a sentence', 'than twenty words']

We want to capture strings starting and ending with quotes, without capturing them, so we use a positive lookbehind assertion (?<=') before, and a lookahead assertion (?=') afterwards.

Inside the quotes, we want to have at least one word, followed by at least one group of space and word. We don’t want it to be a capturing group, otherwise findall would return only this group, so we make it non-catching by using (?:....).

Answered By: Thierry Lathuille

This will deliver the strings between quotation marks, including words and spaces.

import re
st = "this is 'a sentence' with less 'than twenty words'"
re.findall(r"'([w|s]+)'", st)
Answered By: Saeed Ghareh Daghi

Late answer, but you can use:

import re
string = "this is 'a sentence' with less 'than twenty words'"
result = re.findall("'(.*?)'", string)
print result
# ['a sentence', 'than twenty words']

Python Demo
Regex Demo

Answered By: Pedro Lobito
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.