How can I exclude a character when capturing regex group in Python?

Question

The current code is:

"...GO-Results-?T?o?p?-Price-TXT-Regular">(?P<cena>.*?) €</div>'

Now, this group (?P<cena>.*?) captures a price of some items. But the price is given as, for example, 5.400 €. So, this group captures 5.400 . I want it to capture 5400. I have been looking all over the internet on how to do that, but I can’t find anything useful. Does anyone know how to do that?

I need that because I am using this data in Jupyter. But, it then changes 5.400 to 5.4. In my case, 5.400 means five thousand four hundred and not five point four, so this is unacceptable. If it is possible to change this in Jupyter (Pandas), I am also interested in this kind of a solution.

Asked By: Matthew

||

Source

Answer 1

If you want the group to arrive as a coherent unit, you can post-process it:

match.group('cena').replace('.', '')

More complicated version with multiple characters:

match.group('cena').translate(str.maketrans(dict.fromkeys('., ', '')))

Or you can just use regex again:

re.sub(r'D', '', match.group('cena'))

Answered By: Mad Physicist

How can I exclude a character when capturing regex group in Python?

Question:

Answers: