Remove All Commas Between Quotes

Question:

I’m trying to remove all commas that are inside quotes (") with python:

'please,remove all the commas between quotes,"like in here, here, here!"'
                                                          ^     ^

I tried this, but it only removes the first comma inside the quotes:

re.sub(r'(".*?),(.*?")',r'12','please,remove all the commas between quotes,"like in here, here, here!"')

Output:

'please,remove all the commas between quotes,"like in here here, here!"'

How can I make it remove all the commas inside the quotes?

Asked By: carloabelli

||

Answers:

Assuming you don’t have unbalanced or escaped quotes, you can use this regex based on negative lookahead:

>>> str = r'foo,bar,"foobar, barfoo, foobarfoobar"'
>>> re.sub(r'(?!(([^"]*"){2})*[^"]*$),', '', str)
'foo,bar,"foobar barfoo foobarfoobar"'

This regex will find commas if those are inside the double quotes by using a negative lookahead to assert there are NOT even number of quotes after the comma.

Note about the lookaead (?!...):

  • ([^"]*"){2} finds a pair of quotes
  • (([^"]*"){2})* finds 0 or more pair of quotes
  • [^"]*$ makes sure we don’t have any more quotes after last matched quote
  • So (?!...) asserts that we don’t have even number of quotes ahead thus matching commas inside the quoted string only.
Answered By: anubhava

You can pass a function as the repl argument instead of a replacement string. Just get the entire quoted string and do a simple string replace on the commas.

>>> s = 'foo,bar,"foobar, barfoo, foobarfoobar"'
>>> re.sub(r'"[^"]*"', lambda m: m.group(0).replace(',', ''), s)
'foo,bar,"foobar barfoo foobarfoobar"'
Answered By: Brendan Abel

What about doing it with out regex?

input_str = '...'

first_slice = input_str.split('"')

second_slice = [first_slice[0]]
for slc in first_slice[1:]:
    second_slice.extend(slc.split(','))

result = ''.join(second_slice)
Answered By: Dan

Here is another option I came up with if you don’t want to use regex.

input_str = 'please,remove all the commas between quotes,"like in here, here, here!"'

quotes = False

def noCommas(string):
    quotes = False
    output = ''
    for char in string:
        if char == '"':
            quotes = True
        if quotes == False:
            output += char
        if char != ',' and quotes == True:
            output += char
    return output

print noCommas(input_str)
Answered By: albydarned

The above answer with for-looping through the string is very slow, if you want to apply your algorithm to a 5 MB csv file.

This seems to be reasonably fast and provides the same result as the for loop:

#!/bin/python3

data = 'hoko foko; moko soko; "aaa mo; bia"; "ee mo"; "eka koka"; "koni; masa"; "co co"; ehe mo; "bi; ko"; ko man "ka ku"; "ki; ko"n "ko;ma"; "ki ma"n"ehe;";koko'

first_split=data.split('"')
split01=[]
split02=[]
for slc in first_split[0::2]:
    split01.append(slc)
for slc in first_split[1::2]:
    slc_new=",".join(slc.split(";"))
    split02.append(slc_new)

resultlist = [item for sublist in zip(split01, split02) for item in sublist]
if len(split01) > len (split02):
   resultlist.append(split01[-1])
if len(split01) < len (split02):
   resultlist.append(split02[-1])
   
result='"'.join(resultlist)
print(data)
print(split01)
print(split02)
print(result)

Results in:

hoko foko; moko soko; "aaa mo; bia"; "ee mo"; "eka koka"; "koni; masa"; "co co"; ehe mo; "bi; ko"; ko ma
 "ka ku"; "ki; ko"
 "ko;ma"; "ki ma"
"ehe;";koko
['hoko foko; moko soko; ', '; ', '; ', '; ', '; ', '; ehe mo; ', '; ko man ', '; ', 'n ', '; ', 'n', ';koko']
['aaa mo, bia', 'ee mo', 'eka koka', 'koni, masa', 'co co', 'bi, ko', 'ka ku', 'ki, ko', 'ko,ma', 'ki ma', 'ehe,']
hoko foko; moko soko; "aaa mo, bia"; "ee mo"; "eka koka"; "koni, masa"; "co co"; ehe mo; "bi, ko"; ko ma
 "ka ku"; "ki, ko"
 "ko,ma"; "ki ma"
"ehe,";koko
Answered By: amirzolal
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.