how to iterare over list of words?
Question:
I have a regular expression to find a specific word and the filters some text.
But now I want to iterate over a list of words and filter then some text.
So this is my regular expression:
def regex_fruit_cost(subst):
return r"(?<=" + subst + r").*?(?P<number>[0-9,.]*)n"
and this is my list of words:
fruit_words = ['Appels', 'Ananas', 'Peen Waspeen',
'Tomaten Cherry', 'Sinaasappels',
'Watermeloenen', 'Rettich', 'Peren', 'Peen', 'Mandarijnen', 'Meloenen', 'Grapefruit']
and the text for filtering the words from it:
text t.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.nnrut ard wegetablesnx0c']"
Combine it:
regex = r"(d*(?:.d+)*)s*W+(" + '|'.join(re.escape(word)
for word in fruit_words) + ')'
fruit_cost_found = re.findall(regex_fruit_cost('|'.join(re.escape(word) for word in fruit_words)), text)
print(*fruit_cost_found, sep='n')
But then I get this ‘strange error’:
raise error("look-behind requires fixed-width pattern")
re.error: look-behind requires fixed-width pattern
Question: what I have to change that it can iterate over the list of words?
so in the text you will find:
Watermeloenen, Appels. So the regular expression works for a single word. For example Watermeloenen. And then it returns this list:
123,20
2.772,00
46,20
577,50
69,30
But now I want to have all the numbers for all the fruit words.
I try it like this:
single_fruit = [fruit for fruit in fruit_words]
fruit_cost_found = re.findall(regex_fruit_cost(single_fruit), text)
error:
return r"(?<=" + subst + r").*?(?P<number>[0-9,.]*)n"
TypeError: can only concatenate str (not "list") to str
Answers:
single_fruit
in your code is a list. You need to iterate over its elements to be able to use in in re.findall
as it requires a string:
fruit_cost_found = []
for fruit in single_fruit:
m = re.findall(regex_fruit_cost(fruit), text)
if m:
fruit_cost_found.append(m)
fruit_cost_found = [item for sublist in fruit_cost_found for item in sublist]
['3.488,16', '137,50', '500,00', '1.000,00', '2.000,00', '1.000,00', '381,25', '123,20', '2.772,00', '46,20', '577,50', '69,30']
I have a regular expression to find a specific word and the filters some text.
But now I want to iterate over a list of words and filter then some text.
So this is my regular expression:
def regex_fruit_cost(subst):
return r"(?<=" + subst + r").*?(?P<number>[0-9,.]*)n"
and this is my list of words:
fruit_words = ['Appels', 'Ananas', 'Peen Waspeen',
'Tomaten Cherry', 'Sinaasappels',
'Watermeloenen', 'Rettich', 'Peren', 'Peen', 'Mandarijnen', 'Meloenen', 'Grapefruit']
and the text for filtering the words from it:
text t.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.nnrut ard wegetablesnx0c']"
Combine it:
regex = r"(d*(?:.d+)*)s*W+(" + '|'.join(re.escape(word)
for word in fruit_words) + ')'
fruit_cost_found = re.findall(regex_fruit_cost('|'.join(re.escape(word) for word in fruit_words)), text)
print(*fruit_cost_found, sep='n')
But then I get this ‘strange error’:
raise error("look-behind requires fixed-width pattern")
re.error: look-behind requires fixed-width pattern
Question: what I have to change that it can iterate over the list of words?
so in the text you will find:
Watermeloenen, Appels. So the regular expression works for a single word. For example Watermeloenen. And then it returns this list:
123,20
2.772,00
46,20
577,50
69,30
But now I want to have all the numbers for all the fruit words.
I try it like this:
single_fruit = [fruit for fruit in fruit_words]
fruit_cost_found = re.findall(regex_fruit_cost(single_fruit), text)
error:
return r"(?<=" + subst + r").*?(?P<number>[0-9,.]*)n"
TypeError: can only concatenate str (not "list") to str
single_fruit
in your code is a list. You need to iterate over its elements to be able to use in in re.findall
as it requires a string:
fruit_cost_found = []
for fruit in single_fruit:
m = re.findall(regex_fruit_cost(fruit), text)
if m:
fruit_cost_found.append(m)
fruit_cost_found = [item for sublist in fruit_cost_found for item in sublist]
['3.488,16', '137,50', '500,00', '1.000,00', '2.000,00', '1.000,00', '381,25', '123,20', '2.772,00', '46,20', '577,50', '69,30']