Removing all text within double quotes

Question:

I am working on preprocessing some text in Python and would like to get rid of all text that appears in double quotes within the text. I am unsure how to do that and will appreciate your help with. A minimally reproducible example is below for your reference. Thank you in advance.

x='The frog said "All this needs to get removed" something'

So, pretty much what I want to get is 'The frog said something' by removing the text in the double quotes from x above, and I am not sure how to do that. Thanks once again.

Asked By: Dave

||

Answers:

Use regex substitution:

import re

x='The frog said "All this needs to get removed" something'
res = re.sub(r's*"[^"]+"s*', ' ', x)
print(res)

The frog said something

  • s* – match optional whitespace characters
  • " – match " char as is
  • [^"]+ – match any character except " (ensured via ^ sign) one or more
Answered By: RomanPerekhrest

If you want to use index and slicing:

s='The frog said "All this needs to get removed" something'

# To get the index of both the quotes
[i for i, x in enumerate(s) if x == '"']
#[14, 44]

s[:13]+s[45:]
#'The frog said something'
Answered By: Talha Tayyab

A quick fix assuming " are balanced in the string, i.e. are even, and double spaces are not relevant.

x = 'The frog said "All this needs to get removed" something'

x_new = ''.join(x.split('"')[::2]).replace('  ', ' ')

Eventually, these conditions can be checked with str.count:

if x.count('"') % 2 != 0:
   raise Exception('Double quotes are not balanced')

if x.count("  ") > 0:
   raise Exception('Double spaces are present')
Answered By: cards
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.