Python regex to extract words (ended with specific non-duplicated letter) in sentece

Question:

Strings that I want to extract the part of "xxm".

I tried below:

ss = ['The stick is 36mm wide 20m long white', 
'Another is 55mm wide 10m long black', 
'Last one the length is 360m']

for s in ss:
    found = re.findall(r' [0-9]+m', s)
    print (found)

The wanted results are ’20m’ and ’10m’ respectively, but it outputs:

[' 36m', ' 20m']
[' 55m', ' 10m']

I tried to change it to below, but it’s not a solution:

r' [0-9]+m$'

How can I extract the parts ended with only 1 ‘m’ (not ‘mm’)?

Asked By: Mark K

||

Answers:

Here is a possible solution (using b as a word boundary):

found = re.findall(r'b[0-9]+mb', s)

Output:

['20m']
['10m']
['360m']
Answered By: Riccardo Bucco

You can use the word boundary character b:

ss = ['The stick is 36mm wide 20m long white', 
'Another is 55mm wide 10m long black', 
'Last one the length is 360m']

for s in ss:
    found = re.findall(r"b[0-9]+mb",s)
    print(found)

Output:

# ['20m']
# ['10m']
# ['360m']

If you want to include only 2-digit numbers (so not include 360m in this case), you can set the number of repetition you want to allow with {min,max}. In your case:

for s in ss:
    found = re.findall(r"b[0-9]{1,2}mb",s)
    print(found)

Output:

# ['20m']
# ['10m']
Answered By: atteggiani
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.