Can re.findall() output be a list solely matched string?


I am extracting a list of item names from a html code (see below)

[<p class="a-text-left a-size-base">Klasisch 1 Säulen Grün S</p>, <p class="a-text-left a-size-base">Klasisch 3 Säulen Braun L</p>, <p class="a-text-left a-size-base">Klasisch 3 Säulen Braun M</p>, <p class="a-text-left a-size-base">Klasisch 3 Säulen Grün M</p>, <p class="a-text-left a-size-base">Mit Hängematte Grün L</p>, <p class="a-text-left a-size-base">Weinachten 3 Säulen Grün L</p>]

Then, I did regular expression to extract the names

var_text = re.findall(r'>.+?<', str(varyasyonlar_text))

then, the output have been again list but with the charesters "<" ">", which I don’t want to.

['>Klasisch 1 Säulen Grün S<', '>, <', '>Klasisch 3 Säulen Braun L<', '>, <', '>Klasisch 3 Säulen Braun M<', '>, <', '>Klasisch 3 Säulen Grün M<', '>, <', '>Mit Hängematte Grün L<', '>, <', '>Weinachten 3 Säulen Grün L<']

I only want to clean a list with the captured names. My question is now, how can I modify my regex command.

Thank you so much

Asked By: Enes Kasikci



use parenthesis to capture the result you want and to eliminate getting comma values, just add a negation for it

> : match literal
[^,] : don't match ","
.+? : non greedy match all characters
< : end of pattern

re.findall(r'>[^,](.+?)<', str(varyasyonlar_text))
['lasisch 1 Säulen Grün S',
 'lasisch 3 Säulen Braun L',
 'lasisch 3 Säulen Braun M',
 'lasisch 3 Säulen Grün M',
 'it Hängematte Grün L',
 'einachten 3 Säulen Grün L']
Answered By: Naveed
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.