split by comma if comma not in between brackets while allowing characters to be outside the brackets with in the same comma split
Question:
I have this python script. That uses some regular expression.
I want to split the string s, but commas while ignoring any commas that exists within the brackets.
s = """aa,bb,(cc,dd),m(ee,ff)"""
splits = re.split(r's*(([^)]*)|[^,]+)', s, re.M|re.S)
print('n'.join(splits))
Actual output:
aa
,
bb
,
(cc,dd)
,
m(ee
,
ff)
Desired output:
aa
bb
(cc,dd)
m(ee,ff)
So I can’t make it handle having text outside the brackets.
Was hoping someone could help me out.
Answers:
Consider using findall
instead – repeat a group that matches (
s followed by non-)
characters, followed by )
, or matches non-,
characters:
s = """aa,bb,m(cc,dd)"""
matches = re.findall(r'(?:([^(]+)|[^,])+', s, re.M|re.S)
print('n'.join(matches))
If speed is an issue, you can make it a bit more efficient by putting (
in the other negative character set, and alternating it first:
(?:[^(,]+|([^(]+))+
You may use this regex with a lookahead for split:
>>> s = """aa,bb,(cc,dd),m(ee,ff)"""
>>> print ( re.split(r',(?![^()]*))', s) )
['aa', 'bb', '(cc,dd)', 'm(ee,ff)']
RegEx Details:
,
: Match a comma
(?![^()]*))
: A negative lookahead assertion that makes sure we don’t match comma inside (...)
by asserting that there is no )
ahead after 0 or more not bracket characters.
try : r’,([^,()][(][^()][)][^,])|([^,]+)’
tested on regex101 : https://regex101.com/r/pJxRwQ/1
I needed to do something similar, but I also had nested brackets.
The proposed regex expressions do NOT handle nesting.
I couldn’t find a regex solution, but here is a python function solution that achieves the same thing:
def comma_split(text: str) -> list[str]:
flag = 0
buffer = ""
result = []
for char_ in text:
if char_ == "[":
flag += 1
elif char_ == "]":
flag -= 1
elif char_ == "," and flag == 0:
result.append(buffer)
buffer = ""
continue
buffer += char_
if buffer:
result.append(buffer)
return result
I have this python script. That uses some regular expression.
I want to split the string s, but commas while ignoring any commas that exists within the brackets.
s = """aa,bb,(cc,dd),m(ee,ff)"""
splits = re.split(r's*(([^)]*)|[^,]+)', s, re.M|re.S)
print('n'.join(splits))
Actual output:
aa
,
bb
,
(cc,dd)
,
m(ee
,
ff)
Desired output:
aa
bb
(cc,dd)
m(ee,ff)
So I can’t make it handle having text outside the brackets.
Was hoping someone could help me out.
Consider using findall
instead – repeat a group that matches (
s followed by non-)
characters, followed by )
, or matches non-,
characters:
s = """aa,bb,m(cc,dd)"""
matches = re.findall(r'(?:([^(]+)|[^,])+', s, re.M|re.S)
print('n'.join(matches))
If speed is an issue, you can make it a bit more efficient by putting (
in the other negative character set, and alternating it first:
(?:[^(,]+|([^(]+))+
You may use this regex with a lookahead for split:
>>> s = """aa,bb,(cc,dd),m(ee,ff)"""
>>> print ( re.split(r',(?![^()]*))', s) )
['aa', 'bb', '(cc,dd)', 'm(ee,ff)']
RegEx Details:
,
: Match a comma(?![^()]*))
: A negative lookahead assertion that makes sure we don’t match comma inside(...)
by asserting that there is no)
ahead after 0 or more not bracket characters.
try : r’,([^,()][(][^()][)][^,])|([^,]+)’
tested on regex101 : https://regex101.com/r/pJxRwQ/1
I needed to do something similar, but I also had nested brackets.
The proposed regex expressions do NOT handle nesting.
I couldn’t find a regex solution, but here is a python function solution that achieves the same thing:
def comma_split(text: str) -> list[str]:
flag = 0
buffer = ""
result = []
for char_ in text:
if char_ == "[":
flag += 1
elif char_ == "]":
flag -= 1
elif char_ == "," and flag == 0:
result.append(buffer)
buffer = ""
continue
buffer += char_
if buffer:
result.append(buffer)
return result