Python – How to Remove Words That Started With Number and Contain Period
Question:
What is the best way to remove words in a string that start with numbers and contain periods in Python?
this_string = 'lorum3 ipsum 15.2.3.9.7 bar foo 1. v more text 46 2. here and even more text here v7.8.989'
If I use Regex:
re.sub('[0-9]*.w*', '', this_string)
The result will be:
'lorum3 ipsum bar foo v more text 46 here and even more text here v'
I’m expecting the word v7.8.989
not to be removed, since it’s started with a letter.
It will be great if the removed words aren’t adding the unneeded space. My Regex code above still adds space.
Answers:
You can use this regex to match the strings you want to remove:
(?:^|s)[0-9]+.[0-9.]*(?=s|$)
It matches:
(?:^|s)
: beginning of string or whitespace
[0-9]+
: at least one digit
.
: a period
[0-9.]*
: some number of digits and periods
(?=s|$)
: a lookahead to assert end of string or whitespace
You can then replace any matches with the empty string. In python
this_string = 'lorum3 ipsum 15.2.3.9.7 bar foo 1. v more text 46 2. here and even more text here v7.8.989 and also 1.2.3c as well'
result = re.sub(r'(?:^|s)[0-9]+.[0-9.]*(?=s|$)', '', this_string)
Output:
lorum3 ipsum bar foo v more text 46 here and even more text here v7.8.989 and also 1.2.3c as well
If you don’t want to use regex, you can also do it using simple string operations:
res = ''.join(['' if (e.startswith(('0','1','2','3','4','5','6','7','8','9')) and '.' in e) else e+' ' for e in this_string.split()])
You can try this regex:
(^|s)d[^s]*.+[^s]*
This matches strings like ‘7.a.0.1’ which contains letter extra.
Here is a demo.
If you can make use of a lookbehind, you can match the numbers and replace with an empty string:
(?<!S)d+.[d.]*(?!S)
Explanation
(?<!S)
Assert a whitespace boundary to the left
d+.[d.]*
Match 1+ digits, then a dot followed by optional digits or dots
(?!S)
Assert a whitespace boundary to the right
If you want to match an optional leading whitespace char:
s?(?<!S)d+.[d.]*(?!S)
What is the best way to remove words in a string that start with numbers and contain periods in Python?
this_string = 'lorum3 ipsum 15.2.3.9.7 bar foo 1. v more text 46 2. here and even more text here v7.8.989'
If I use Regex:
re.sub('[0-9]*.w*', '', this_string)
The result will be:
'lorum3 ipsum bar foo v more text 46 here and even more text here v'
I’m expecting the word v7.8.989
not to be removed, since it’s started with a letter.
It will be great if the removed words aren’t adding the unneeded space. My Regex code above still adds space.
You can use this regex to match the strings you want to remove:
(?:^|s)[0-9]+.[0-9.]*(?=s|$)
It matches:
(?:^|s)
: beginning of string or whitespace[0-9]+
: at least one digit.
: a period[0-9.]*
: some number of digits and periods(?=s|$)
: a lookahead to assert end of string or whitespace
You can then replace any matches with the empty string. In python
this_string = 'lorum3 ipsum 15.2.3.9.7 bar foo 1. v more text 46 2. here and even more text here v7.8.989 and also 1.2.3c as well'
result = re.sub(r'(?:^|s)[0-9]+.[0-9.]*(?=s|$)', '', this_string)
Output:
lorum3 ipsum bar foo v more text 46 here and even more text here v7.8.989 and also 1.2.3c as well
If you don’t want to use regex, you can also do it using simple string operations:
res = ''.join(['' if (e.startswith(('0','1','2','3','4','5','6','7','8','9')) and '.' in e) else e+' ' for e in this_string.split()])
You can try this regex:
(^|s)d[^s]*.+[^s]*
This matches strings like ‘7.a.0.1’ which contains letter extra.
Here is a demo.
If you can make use of a lookbehind, you can match the numbers and replace with an empty string:
(?<!S)d+.[d.]*(?!S)
Explanation
(?<!S)
Assert a whitespace boundary to the leftd+.[d.]*
Match 1+ digits, then a dot followed by optional digits or dots(?!S)
Assert a whitespace boundary to the right
If you want to match an optional leading whitespace char:
s?(?<!S)d+.[d.]*(?!S)