Regexp pattern to remove spaces next to brackets and replace any spaces between words/characters inside the brackets with single comma
Question:
I have strings on similar format
hello this is an example [ a b c ]
hello this is another example [ cat bird dog elephant ]
Which I want to transform to
hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
But I don’t understand how to create a regexp pattern that removes any spaces next to the brackets and replaces any number of spaces between words/characters inside the brackets with a single ,
.
How would one create such a pattern?
My current attempt is a chain of regexp replacements.
m = re.sub('[s+','[',s)
m = re.sub('s+]',']',m)
m = re.sub('s+',' ',m)
m = re.sub(r's(?=[^[]]*])', ",", m)
But does anyone have any suggestion on how to make it more efficient or more clean?
Answers:
I didn’t manage to do it with a fancy pattern, but how about this little workaround.
Just write a pattern that looks for everything in between the brackets, then deal with that string seperately. Like: split it by whitespace, filter the empty elements (from leading and trailing whitespaces at start and end) and join it back together as one string seperated by a comma.
That modified string you pass to re.sub
and replace it with everything between the brackets.
s1 = "hello this is an example [ a b c ]"
s2 = "hello this is another example [ cat bird dog elephant ]"
pattern = r"(?<=[)(.*)(?=])"
print(
re.sub(
pattern,
','.join(list(filter(None, re.split(r"s+", re.search(pattern, s1).group(1)))))
, s1)
)
print(
re.sub(
pattern,
','.join(list(filter(None, re.split(r"s+", re.search(pattern, s2).group(1)))))
, s2)
)
Output:
hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
In the first step, You can try to extract text between square brackets. Code should look more readable…
foo = 'hello this is another example [ cat bird dog elephant ]'
# get everything between [ and ]
reg_get_between_square_brackets= re.compile(r'[(.*)]')
str_to_replace = reg_get_between_square_brackets.findall(foo)[0]
# replace spaces with coma
new_string = re.sub('s+', ',', str_to_replace.strip()) # strip to remove beginning/ending white space
print(foo.replace(str_to_replace, new_string))
Outputs:
hello this is another example [cat,bird,dog,elephant]
Below is my solution, some comments added.
For the second part (replacing spaces between square brackets with comma, I would rather go for a split() and join() – regex solution is for sure slower.)
import re
str1 = 'hello this is an example [ a b c ]'
str2 = 'hello this is another example [ cat bird dog elephant ]'
# remove the SPACES near square brackets
str1 = re.sub(r'[s*(.*S)s*]', r'[1]', str1)
print(str1)
# replace the SPACES inside the square brackets until no replacement
old_str1 = ''
while old_str1 != str1:
old_str1 = str1
str1 = re.sub(r'[(S*)s+(.*)]', r'[1,2]', str1, count=0)
print(str1)
str2 = re.sub(r'[s*(.*S)s*]', r'[1]', str2)
print(str2)
old_str2 = ''
while old_str2 != str2:
old_str2 = str2
str2 = re.sub(r'[(S*)s+(.*)]', r'[1,2]', str2, count=0)
print(str2)
output
hello this is an example [a b c]
hello this is an example [a,b,c]
hello this is another example [cat bird dog elephant]
hello this is another example [cat,bird,dog,elephant]
You can use a negated character class with a single capture group, and then replace 1 or more spaces with a single comma in group 1 and wrap the result in between square brackets.
[s*([^][]*?)s*]
The pattern matches:
[
Match [
s*
Match optional leading whitespace chars
(
Capture group 1
[^][]*?
Optionally repeat chars other than [
and ]
, as few as possible
)
Close group 1
s*
]
Match literally
See a regex demo with the capture group value and a Python demo.
import re
strings = [
"hello this is an example [ a b c ]",
"hello this is another example [ cat bird dog elephant ]"
]
pattern = r"[s*([^][]*?)s*]"
for s in strings:
print(re.sub(pattern, lambda m: "[{0}]".format(re.sub(r"s+", ',', m.group(1))), s))
Output
hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
I have strings on similar format
hello this is an example [ a b c ]
hello this is another example [ cat bird dog elephant ]
Which I want to transform to
hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
But I don’t understand how to create a regexp pattern that removes any spaces next to the brackets and replaces any number of spaces between words/characters inside the brackets with a single ,
.
How would one create such a pattern?
My current attempt is a chain of regexp replacements.
m = re.sub('[s+','[',s)
m = re.sub('s+]',']',m)
m = re.sub('s+',' ',m)
m = re.sub(r's(?=[^[]]*])', ",", m)
But does anyone have any suggestion on how to make it more efficient or more clean?
I didn’t manage to do it with a fancy pattern, but how about this little workaround.
Just write a pattern that looks for everything in between the brackets, then deal with that string seperately. Like: split it by whitespace, filter the empty elements (from leading and trailing whitespaces at start and end) and join it back together as one string seperated by a comma.
That modified string you pass to re.sub
and replace it with everything between the brackets.
s1 = "hello this is an example [ a b c ]"
s2 = "hello this is another example [ cat bird dog elephant ]"
pattern = r"(?<=[)(.*)(?=])"
print(
re.sub(
pattern,
','.join(list(filter(None, re.split(r"s+", re.search(pattern, s1).group(1)))))
, s1)
)
print(
re.sub(
pattern,
','.join(list(filter(None, re.split(r"s+", re.search(pattern, s2).group(1)))))
, s2)
)
Output:
hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
In the first step, You can try to extract text between square brackets. Code should look more readable…
foo = 'hello this is another example [ cat bird dog elephant ]'
# get everything between [ and ]
reg_get_between_square_brackets= re.compile(r'[(.*)]')
str_to_replace = reg_get_between_square_brackets.findall(foo)[0]
# replace spaces with coma
new_string = re.sub('s+', ',', str_to_replace.strip()) # strip to remove beginning/ending white space
print(foo.replace(str_to_replace, new_string))
Outputs:
hello this is another example [cat,bird,dog,elephant]
Below is my solution, some comments added.
For the second part (replacing spaces between square brackets with comma, I would rather go for a split() and join() – regex solution is for sure slower.)
import re
str1 = 'hello this is an example [ a b c ]'
str2 = 'hello this is another example [ cat bird dog elephant ]'
# remove the SPACES near square brackets
str1 = re.sub(r'[s*(.*S)s*]', r'[1]', str1)
print(str1)
# replace the SPACES inside the square brackets until no replacement
old_str1 = ''
while old_str1 != str1:
old_str1 = str1
str1 = re.sub(r'[(S*)s+(.*)]', r'[1,2]', str1, count=0)
print(str1)
str2 = re.sub(r'[s*(.*S)s*]', r'[1]', str2)
print(str2)
old_str2 = ''
while old_str2 != str2:
old_str2 = str2
str2 = re.sub(r'[(S*)s+(.*)]', r'[1,2]', str2, count=0)
print(str2)
output
hello this is an example [a b c]
hello this is an example [a,b,c]
hello this is another example [cat bird dog elephant]
hello this is another example [cat,bird,dog,elephant]
You can use a negated character class with a single capture group, and then replace 1 or more spaces with a single comma in group 1 and wrap the result in between square brackets.
[s*([^][]*?)s*]
The pattern matches:
[
Match[
s*
Match optional leading whitespace chars(
Capture group 1[^][]*?
Optionally repeat chars other than[
and]
, as few as possible
)
Close group 1s*
]
Match literally
See a regex demo with the capture group value and a Python demo.
import re
strings = [
"hello this is an example [ a b c ]",
"hello this is another example [ cat bird dog elephant ]"
]
pattern = r"[s*([^][]*?)s*]"
for s in strings:
print(re.sub(pattern, lambda m: "[{0}]".format(re.sub(r"s+", ',', m.group(1))), s))
Output
hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]