Split a string by a delimiter in python
Question:
How to split this string where __
is the delimiter
MATCHES__STRING
To get an output of ['MATCHES', 'STRING']
?
For splitting specifically on whitespace, see How do I split a string into a list of words?.
To extract everything before the first delimiter, see Splitting on first occurrence.
To extract everything before the last delimiter, see partition string in python and get value of last segment after colon.
Answers:
You can use the str.split
method: string.split('__')
>>> "MATCHES__STRING".split("__")
['MATCHES', 'STRING']
You may be interested in the csv
module, which is designed for comma-separated files but can be easily modified to use a custom delimiter.
import csv
csv.register_dialect( "myDialect", delimiter = "__", <other-options> )
lines = [ "MATCHES__STRING" ]
for row in csv.reader( lines ):
...
When you have two or more elements in the string (in the example below there are three), then you can use a comma to separate these items:
date, time, event_name = ev.get_text(separator='@').split("@")
After this line of code, the three variables will have values from three parts of the variable ev
.
So, if the variable ev
contains this string and we apply separator @
:
Sa., 23. März@19:00@Klavier + Orchester: SPEZIAL
Then, after the split
operation the variable
date
will have value Sa., 23. März
time
will have value 19:00
event_name
will have value Klavier + Orchester: SPEZIAL
For Python 3.8, you actually don’t need the get_text
method, you can just go with ev.split("@")
, as a matter of fact the get_text
method is throwing an att. error.
So if you have a string variable, for example:
filename = 'file/foo/bar/fox'
You can just split that into different variables with comas as suggested in the above comment but with a correction:
W, X, Y, Z = filename.split('_')
W = 'file'
X = 'foo'
Y = 'bar'
Z = 'fox'
Besides split
and rsplit
, there is partition
/rpartition
. It separates string once, but the way question was asked, it may apply as well.
Example:
>>> "MATCHES__STRING".partition("__")
('MATCHES', '__', 'STRING')
>>> "MATCHES__STRING".partition("__")[::2]
('MATCHES', 'STRING')
And a bit faster then split("_",1)
:
$ python -m timeit "'validate_field_name'.split('_', 1)[-1]"
2000000 loops, best of 5: 136 nsec per loop
$ python -m timeit "'validate_field_name'.partition('_')[-1]"
2000000 loops, best of 5: 108 nsec per loop
Timeit lines are based on this answer
How to split this string where __
is the delimiter
MATCHES__STRING
To get an output of ['MATCHES', 'STRING']
?
For splitting specifically on whitespace, see How do I split a string into a list of words?.
To extract everything before the first delimiter, see Splitting on first occurrence.
To extract everything before the last delimiter, see partition string in python and get value of last segment after colon.
You can use the str.split
method: string.split('__')
>>> "MATCHES__STRING".split("__")
['MATCHES', 'STRING']
You may be interested in the csv
module, which is designed for comma-separated files but can be easily modified to use a custom delimiter.
import csv
csv.register_dialect( "myDialect", delimiter = "__", <other-options> )
lines = [ "MATCHES__STRING" ]
for row in csv.reader( lines ):
...
When you have two or more elements in the string (in the example below there are three), then you can use a comma to separate these items:
date, time, event_name = ev.get_text(separator='@').split("@")
After this line of code, the three variables will have values from three parts of the variable ev
.
So, if the variable ev
contains this string and we apply separator @
:
Sa., 23. März@19:00@Klavier + Orchester: SPEZIAL
Then, after the split
operation the variable
date
will have valueSa., 23. März
time
will have value19:00
event_name
will have valueKlavier + Orchester: SPEZIAL
For Python 3.8, you actually don’t need the get_text
method, you can just go with ev.split("@")
, as a matter of fact the get_text
method is throwing an att. error.
So if you have a string variable, for example:
filename = 'file/foo/bar/fox'
You can just split that into different variables with comas as suggested in the above comment but with a correction:
W, X, Y, Z = filename.split('_')
W = 'file'
X = 'foo'
Y = 'bar'
Z = 'fox'
Besides split
and rsplit
, there is partition
/rpartition
. It separates string once, but the way question was asked, it may apply as well.
Example:
>>> "MATCHES__STRING".partition("__")
('MATCHES', '__', 'STRING')
>>> "MATCHES__STRING".partition("__")[::2]
('MATCHES', 'STRING')
And a bit faster then split("_",1)
:
$ python -m timeit "'validate_field_name'.split('_', 1)[-1]"
2000000 loops, best of 5: 136 nsec per loop
$ python -m timeit "'validate_field_name'.partition('_')[-1]"
2000000 loops, best of 5: 108 nsec per loop
Timeit lines are based on this answer