Regexp pattern to remove spaces next to brackets and replace any spaces between words/characters inside the brackets with single comma

Question:

I have strings on similar format

hello this is an example [ a b c ]

hello this is another example [ cat bird dog elephant ]

Which I want to transform to

hello this is an example [a,b,c]

hello this is another example [cat,bird,dog,elephant]

But I don’t understand how to create a regexp pattern that removes any spaces next to the brackets and replaces any number of spaces between words/characters inside the brackets with a single ,.

How would one create such a pattern?

My current attempt is a chain of regexp replacements.

m = re.sub('[s+','[',s)
m = re.sub('s+]',']',m)
m = re.sub('s+',' ',m)
m = re.sub(r's(?=[^[]]*])', ",", m)

But does anyone have any suggestion on how to make it more efficient or more clean?

Asked By: Kspr

||

Answers:

I didn’t manage to do it with a fancy pattern, but how about this little workaround.
Just write a pattern that looks for everything in between the brackets, then deal with that string seperately. Like: split it by whitespace, filter the empty elements (from leading and trailing whitespaces at start and end) and join it back together as one string seperated by a comma.
That modified string you pass to re.sub and replace it with everything between the brackets.

s1 = "hello this is an example [ a    b c ]"
s2 = "hello this is another example [ cat    bird dog elephant   ]"

pattern = r"(?<=[)(.*)(?=])"

print(
    re.sub(
        pattern, 
        ','.join(list(filter(None, re.split(r"s+", re.search(pattern, s1).group(1)))))
        , s1)
)

print(
    re.sub(
        pattern, 
        ','.join(list(filter(None, re.split(r"s+", re.search(pattern, s2).group(1)))))
        , s2)
)

Output:

hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
Answered By: Rabinzel

In the first step, You can try to extract text between square brackets. Code should look more readable…

foo = 'hello this is another example [ cat    bird dog elephant   ]'

# get everything between [ and ]
reg_get_between_square_brackets= re.compile(r'[(.*)]')
str_to_replace = reg_get_between_square_brackets.findall(foo)[0]

# replace spaces with coma
new_string = re.sub('s+', ',', str_to_replace.strip())  # strip to remove beginning/ending white space
print(foo.replace(str_to_replace, new_string))  

Outputs:

hello this is another example [cat,bird,dog,elephant]
Answered By: RobertG

Below is my solution, some comments added.

For the second part (replacing spaces between square brackets with comma, I would rather go for a split() and join() – regex solution is for sure slower.)

import re

str1 = 'hello this is an example [ a    b c ]'

str2 = 'hello this is another example [ cat    bird dog elephant   ]'

# remove the SPACES near square brackets
str1 =  re.sub(r'[s*(.*S)s*]', r'[1]', str1)
print(str1)
# replace the SPACES inside the square brackets until no replacement
old_str1 = ''
while old_str1 != str1:
    old_str1 = str1
    str1 =  re.sub(r'[(S*)s+(.*)]', r'[1,2]', str1, count=0)
print(str1)


str2 =  re.sub(r'[s*(.*S)s*]', r'[1]', str2)
print(str2)
old_str2 = ''
while old_str2 != str2:
    old_str2 = str2
    str2 =  re.sub(r'[(S*)s+(.*)]', r'[1,2]', str2, count=0)
print(str2)

output

hello this is an example [a    b c]
hello this is an example [a,b,c]
hello this is another example [cat    bird dog elephant]
hello this is another example [cat,bird,dog,elephant]
Answered By: Marcel Preda

You can use a negated character class with a single capture group, and then replace 1 or more spaces with a single comma in group 1 and wrap the result in between square brackets.

[s*([^][]*?)s*]

The pattern matches:

  • [ Match [
  • s* Match optional leading whitespace chars
  • ( Capture group 1
    • [^][]*?Optionally repeat chars other than [ and ], as few as possible
  • ) Close group 1
  • s*
  • ] Match literally

See a regex demo with the capture group value and a Python demo.

import re

strings = [
    "hello this is an example [ a    b c ]",
    "hello this is another example [ cat    bird dog elephant   ]"
]

pattern = r"[s*([^][]*?)s*]"
for s in strings:
    print(re.sub(pattern, lambda m: "[{0}]".format(re.sub(r"s+", ',', m.group(1))), s))

Output

hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
Answered By: The fourth bird
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.