What is the difference between shlex.split() and re.split()?
Question:
So I used shlex.split()
recently to split a command as argument to subprocess.Popen()
function. I recalled that long back I also used re.split()
function to split a string with a specific delimiter specified. Can someone point out what is the essential difference in between them? In which scenario is each function best suited?
Answers:
shlex.split()
is designed to work like the shell’s split mechanism.
This means doing things like respecting quotes, etc.
>>> shlex.split("this is 'my string' that --has=arguments -or=something")
['this', 'is', 'my string', 'that', '--has=arguments', '-or=something']
re.split()
will just split on whatever pattern you define.
>>> re.split('s', "this is 'my string' that --has=arguments -or=something")
['this', 'is', "'my", "string'", 'that', '--has=arguments', '-or=something']
Trying to define your own regex to work like shlex.split
is needlessly complicated, if it’s even possible.
To really see the differences between the two, you can always Use the Source, Luke:
>>> re.__file__
'/usr/lib/python3.5/re.py'
>>> shlex.__file__
'/usr/lib/python3.5/shlex.py'
Open these files in your favorite editor and start poking around, you’ll find that they operate quite differently.
I don’t have enough reputation to comment.
Wayne Werner mentioned above:
Trying to define your own regex to work like shlex.split is needlessly complicated, if it’s even possible.
Needlessly complicated? Yes. Possible? Also YES, with something like this (by setting the named group to ‘ ‘). It’s possible because state can be encoded in regex by using recursive expressions.
So I used shlex.split()
recently to split a command as argument to subprocess.Popen()
function. I recalled that long back I also used re.split()
function to split a string with a specific delimiter specified. Can someone point out what is the essential difference in between them? In which scenario is each function best suited?
shlex.split()
is designed to work like the shell’s split mechanism.
This means doing things like respecting quotes, etc.
>>> shlex.split("this is 'my string' that --has=arguments -or=something")
['this', 'is', 'my string', 'that', '--has=arguments', '-or=something']
re.split()
will just split on whatever pattern you define.
>>> re.split('s', "this is 'my string' that --has=arguments -or=something")
['this', 'is', "'my", "string'", 'that', '--has=arguments', '-or=something']
Trying to define your own regex to work like shlex.split
is needlessly complicated, if it’s even possible.
To really see the differences between the two, you can always Use the Source, Luke:
>>> re.__file__
'/usr/lib/python3.5/re.py'
>>> shlex.__file__
'/usr/lib/python3.5/shlex.py'
Open these files in your favorite editor and start poking around, you’ll find that they operate quite differently.
I don’t have enough reputation to comment.
Wayne Werner mentioned above:
Trying to define your own regex to work like shlex.split is needlessly complicated, if it’s even possible.
Needlessly complicated? Yes. Possible? Also YES, with something like this (by setting the named group to ‘ ‘). It’s possible because state can be encoded in regex by using recursive expressions.