how to iterare over list of words?

Question:

I have a regular expression to find a specific word and the filters some text.

But now I want to iterate over a list of words and filter then some text.

So this is my regular expression:

def regex_fruit_cost(subst):
    return r"(?<=" + subst + r").*?(?P<number>[0-9,.]*)n"

and this is my list of words:

fruit_words = ['Appels', 'Ananas', 'Peen Waspeen',
               'Tomaten Cherry', 'Sinaasappels',
               'Watermeloenen', 'Rettich', 'Peren', 'Peen', 'Mandarijnen', 'Meloenen', 'Grapefruit']

and the text for filtering the words from it:

text t.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.nnrut ard wegetablesnx0c']"

Combine it:

regex = r"(d*(?:.d+)*)s*W+(" + '|'.join(re.escape(word)
                                             for word in fruit_words) + ')'
fruit_cost_found = re.findall(regex_fruit_cost('|'.join(re.escape(word) for word in fruit_words)), text)
print(*fruit_cost_found, sep='n')

But then I get this ‘strange error’:

raise error("look-behind requires fixed-width pattern")
re.error: look-behind requires fixed-width pattern

Question: what I have to change that it can iterate over the list of words?

so in the text you will find:

Watermeloenen, Appels. So the regular expression works for a single word. For example Watermeloenen. And then it returns this list:

123,20
2.772,00
46,20
577,50
69,30

But now I want to have all the numbers for all the fruit words.

I try it like this:

single_fruit = [fruit for fruit in  fruit_words]
fruit_cost_found = re.findall(regex_fruit_cost(single_fruit), text)

error:

return r"(?<=" + subst + r").*?(?P<number>[0-9,.]*)n"
TypeError: can only concatenate str (not "list") to str
Asked By: mightycode Newton

||

Answers:

single_fruit in your code is a list. You need to iterate over its elements to be able to use in in re.findall as it requires a string:

fruit_cost_found = []

for fruit in single_fruit:
    m = re.findall(regex_fruit_cost(fruit), text)
    if m:
        fruit_cost_found.append(m)

fruit_cost_found = [item for sublist in fruit_cost_found for item in sublist]

['3.488,16', '137,50', '500,00', '1.000,00', '2.000,00', '1.000,00', '381,25', '123,20', '2.772,00', '46,20', '577,50', '69,30']
Answered By: Anoushiravan R
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.