python re.split() to split by spaces, commas, and periods, but not in cases like 1,000 or 1.50
Question:
I want to use python re.split()
to split a string into individual words by spaces, commas and periods. But I don’t want "1,200"
to be split into ["1", "200"]
or ["1.2"]
to be split into ["1", "2"]
.
Example
l = "one two 3.4 5,6 seven.eight nine,ten"
The result should be ["one", "two", "3.4", "5,6" , "seven", "eight", "nine", "ten"]
Answers:
Use a negative lookahead and a negative lookbehind:
> s = "one two 3.4 5,6 seven.eight nine,ten"
> parts = re.split('s|(?<!d)[,.](?!d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten']
In other words, you always split by s
(whitespace), and only split by commas and periods if they are not followed (?!d)
or preceded (?<!d)
by a digit.
DEMO.
EDIT: As per @verdesmarald comment, you may want to use the following instead:
> s = "one two 3.4 5,6 seven.eight nine,ten,1.2,a,5"
> print re.split('s|(?<!d)[,.]|[,.](?!d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten', '1.2', 'a', '5']
This will split "1.2,a,5"
into ["1.2", "a", "5"]
.
DEMO.
So you want to split on spaces, and on commas and periods that aren’t surrounded by numbers. This should work:
r" |(?<![0-9])[.,](?![0-9])"
I want to use python re.split()
to split a string into individual words by spaces, commas and periods. But I don’t want "1,200"
to be split into ["1", "200"]
or ["1.2"]
to be split into ["1", "2"]
.
Example
l = "one two 3.4 5,6 seven.eight nine,ten"
The result should be ["one", "two", "3.4", "5,6" , "seven", "eight", "nine", "ten"]
Use a negative lookahead and a negative lookbehind:
> s = "one two 3.4 5,6 seven.eight nine,ten"
> parts = re.split('s|(?<!d)[,.](?!d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten']
In other words, you always split by s
(whitespace), and only split by commas and periods if they are not followed (?!d)
or preceded (?<!d)
by a digit.
DEMO.
EDIT: As per @verdesmarald comment, you may want to use the following instead:
> s = "one two 3.4 5,6 seven.eight nine,ten,1.2,a,5"
> print re.split('s|(?<!d)[,.]|[,.](?!d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten', '1.2', 'a', '5']
This will split "1.2,a,5"
into ["1.2", "a", "5"]
.
DEMO.
So you want to split on spaces, and on commas and periods that aren’t surrounded by numbers. This should work:
r" |(?<![0-9])[.,](?![0-9])"