Get words between specific words in a Python string

Question

I’m working on getting the words between certain words in a string.

Find string between two substrings Referring to this article, I succeeded in catching words in the following way.

s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))

But in the sentence below it failed.

s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''


result = re.search('<span class="discount-rate">(.*)</span>', s)
print(result.group(1))

I’m trying to bring ‘4%’. Everything else succeeds, but I don’t know why only this one fails.
Help

Asked By: anfwkdrn

||

Source

Answer 1

Try this (mind the white spaces and new lines)

import re
s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''


result = re.search('<span class="discount-rate">s*(.*)s*</span>', s)
print(result.group(1))

Answered By: Meh

Answer 2

Use re.DOTALL flag for matching new lines:

result = re.search('<span class="discount-rate">(.*)</span>', s, re.DOTALL)

Documentation: https://docs.python.org/3/library/re.html

Answered By: Daniel

Answer 3

There are newline characters in your string which won’t be matched against your regex.

Daniel’s solution works.

Answered By: Jean Carlo

Answer 4

This is structured data, not just a string, so we can use a library like Beautiful Soup to help us simplify such tasks:

from bs4 import BeautifulSoup

s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''

soup = BeautifulSoup(s)
value = soup.find(class_='discount-rate').get_text(strip=True)
print(value)

# Output:
4%

Answered By: BeRT2me

Get words between specific words in a Python string

Question:

Answers: