Python regex to extract words (ended with specific non-duplicated letter) in sentece
Question:
Strings that I want to extract the part of "xxm".
I tried below:
ss = ['The stick is 36mm wide 20m long white',
'Another is 55mm wide 10m long black',
'Last one the length is 360m']
for s in ss:
found = re.findall(r' [0-9]+m', s)
print (found)
The wanted results are ’20m’ and ’10m’ respectively, but it outputs:
[' 36m', ' 20m']
[' 55m', ' 10m']
I tried to change it to below, but it’s not a solution:
r' [0-9]+m$'
How can I extract the parts ended with only 1 ‘m’ (not ‘mm’)?
Answers:
Here is a possible solution (using b
as a word boundary):
found = re.findall(r'b[0-9]+mb', s)
Output:
['20m']
['10m']
['360m']
You can use the word boundary character b:
ss = ['The stick is 36mm wide 20m long white',
'Another is 55mm wide 10m long black',
'Last one the length is 360m']
for s in ss:
found = re.findall(r"b[0-9]+mb",s)
print(found)
Output:
# ['20m']
# ['10m']
# ['360m']
If you want to include only 2-digit numbers (so not include 360m in this case), you can set the number of repetition you want to allow with {min,max}. In your case:
for s in ss:
found = re.findall(r"b[0-9]{1,2}mb",s)
print(found)
Output:
# ['20m']
# ['10m']
Strings that I want to extract the part of "xxm".
I tried below:
ss = ['The stick is 36mm wide 20m long white',
'Another is 55mm wide 10m long black',
'Last one the length is 360m']
for s in ss:
found = re.findall(r' [0-9]+m', s)
print (found)
The wanted results are ’20m’ and ’10m’ respectively, but it outputs:
[' 36m', ' 20m']
[' 55m', ' 10m']
I tried to change it to below, but it’s not a solution:
r' [0-9]+m$'
How can I extract the parts ended with only 1 ‘m’ (not ‘mm’)?
Here is a possible solution (using b
as a word boundary):
found = re.findall(r'b[0-9]+mb', s)
Output:
['20m']
['10m']
['360m']
You can use the word boundary character b:
ss = ['The stick is 36mm wide 20m long white',
'Another is 55mm wide 10m long black',
'Last one the length is 360m']
for s in ss:
found = re.findall(r"b[0-9]+mb",s)
print(found)
Output:
# ['20m']
# ['10m']
# ['360m']
If you want to include only 2-digit numbers (so not include 360m in this case), you can set the number of repetition you want to allow with {min,max}. In your case:
for s in ss:
found = re.findall(r"b[0-9]{1,2}mb",s)
print(found)
Output:
# ['20m']
# ['10m']