Python regex to extract words (ended with specific non-duplicated letter) in sentece

Question

Strings that I want to extract the part of "xxm".

I tried below:

ss = ['The stick is 36mm wide 20m long white', 
'Another is 55mm wide 10m long black', 
'Last one the length is 360m']

for s in ss:
    found = re.findall(r' [0-9]+m', s)
    print (found)

The wanted results are ’20m’ and ’10m’ respectively, but it outputs:

[' 36m', ' 20m']
[' 55m', ' 10m']

I tried to change it to below, but it’s not a solution:

r' [0-9]+m$'

How can I extract the parts ended with only 1 ‘m’ (not ‘mm’)?

Asked By: Mark K

||

Source

Answer 1

Here is a possible solution (using b as a word boundary):

found = re.findall(r'b[0-9]+mb', s)

Output:

['20m']
['10m']
['360m']

Answered By: Riccardo Bucco

Answer 2

You can use the word boundary character b:

ss = ['The stick is 36mm wide 20m long white', 
'Another is 55mm wide 10m long black', 
'Last one the length is 360m']

for s in ss:
    found = re.findall(r"b[0-9]+mb",s)
    print(found)

Output:

# ['20m']
# ['10m']
# ['360m']

If you want to include only 2-digit numbers (so not include 360m in this case), you can set the number of repetition you want to allow with {min,max}. In your case:

for s in ss:
    found = re.findall(r"b[0-9]{1,2}mb",s)
    print(found)

Output:

# ['20m']
# ['10m']

Answered By: atteggiani

Python regex to extract words (ended with specific non-duplicated letter) in sentece

Question:

Answers: