Python Regex to find a string in double quotes within a string

Question:

I’m looking for a code in python using regex that can perform something like this

Input: Regex should return "String 1" or "String 2" or "String3"

Output: String 1,String2,String3

I tried r'"*"'

Asked By: nomi

||

Answers:

Here’s all you need to do:

def doit(text):      
  import re
  matches = re.findall(r'"(.+?)"',text)
  # matches is now ['String 1', 'String 2', 'String3']
  return ",".join(matches)

doit('Regex should return "String 1" or "String 2" or "String3" ')

result:

'String 1,String 2,String3'

As pointed out by Li-aung Yip:

To elaborate, .+? is the "non-greedy" version of .+. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+, will give String 1" or "String 2" or "String 3; the non-greedy version .+? gives String 1, String 2, String 3.

In addition, if you want to accept empty strings, change .+ to .*. Star * means zero or more while plus + means at least one.

Answered By: Johan Lundberg

Just try to fetch double quoted strings from the multiline string:

import re

s = """ 
"my name is daniel"  "mobile 8531111453733"[[[[[[--"i like pandas"
"location chennai"! -asfas"aadhaar du2mmy8969769##69869" 
@4343453 "pincode 642002""@mango,@apple,@berry" 
"""
print(re.findall(r'"(.*?)"', s))
Answered By: Daniel Muthupandi
import re
r=r"'(\'|[^'])*(?!<\)'|"(\"|[^"])*(?!<\)""

texts=[r'"aerrrt"',
r'"a"e'+"'"+'rrt"',
r'"a""""arrtt"""""',
r'"aerrrt',
r'"a"errt'+"'",
r"'aerrrt'",
r"'a'e"+'"'+"rrt'",
r"'a''''arrtt'''''",
r"'aerrrt",
r"'a'errt"+'"',
      "''",'""',""]

for text in texts:
     print (text,"-->",re.fullmatch(r,text))

results:

"aerrrt" --> <_sre.SRE_Match object; span=(0, 8), match='"aerrrt"'>
"a"e'rrt" --> <_sre.SRE_Match object; span=(0, 10), match='"a\"e'rrt"'>
"a""""arrtt""""" --> None
"aerrrt --> None
"a"errt' --> None
'aerrrt' --> <_sre.SRE_Match object; span=(0, 8), match="'aerrrt'">
'a'e"rrt' --> <_sre.SRE_Match object; span=(0, 10), match=''a\'e"rrt''>
'a''''arrtt''''' --> None
'aerrrt --> None
'a'errt" --> None
'' --> <_sre.SRE_Match object; span=(0, 2), match="''">
"" --> <_sre.SRE_Match object; span=(0, 2), match='""'>
 --> None
Answered By: PatriceC

The highly up-voted answer doesn’t account for the possibility that the double-quoted string might contain one or more double-quote characters (properly escaped, of course). To handle this situation, the regex needs to accumulate characters one-by-one with a positive lookahead assertion stating that the current character is not a double-quote character that is not preceded by a backslash (which requires a negative lookbehind assertion):

"(?:(?:(?!(?<!\)").)*)"

See Regex Demo

import re
import ast


def doit(text):
    matches=re.findall(r'"(?:(?:(?!(?<!\)").)*)"',text)
    for match in matches:
        print(match, '=>', ast.literal_eval(match))


doit('Regex should return "String 1" or "String 2" or "String3" and "\"double quoted string\"" ')

Prints:

"String 1" => String 1
"String 2" => String 2
"String3" => String3
""double quoted string"" => "double quoted string"
Answered By: Booboo

From https://stackoverflow.com/a/69891301/1531728

My solution is:

import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw   f "first" +&%#$%"second",vwrfhir, d2e   u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due"        "tre"fef    fre f', '       "uno""dos"      "tres"', '"unu""doua""trei"', '      "um"                    "dois"           "tres"                  ']
my_substrings = []
for current_test_string in my_strings:
    for values in re.findall(r'"(.+?)"', current_test_string):
        my_substrings.append(values)
        #print("values are:",values,"=")
    print(" my_substrings are:",my_substrings,"=")
    my_substrings = []

Alternate regular expressions to use are:

  • re.findall(‘"(.+?)"’, current_test_string) [Avinash2021] [user17405772021]
  • re.findall(‘"(.*?)"’, current_test_string) [Shelvington2020]
  • re.findall(r’"(.*?)"’, current_test_string) [Lundberg2012] [Avinash2021]
  • re.findall(r’"(.+?)"’, current_test_string) [Lundberg2012] [Avinash2021]
  • re.findall(r’"["]’, current_test_string) [Muthupandi2019]
  • re.findall(r’"([^"]*)"’, current_test_string) [Pieters2014]
  • re.findall(r’"(?:(?:(?!(?<!)").)*)"’, current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Booboo2020]
  • re.findall(r’"(.*?)(?<!)"’, current_test_string) [Hassan2014]
  • re.findall(‘"[^"]*"’, current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Martelli2013]
  • re.findall(‘"([^"]*)"’, current_test_string) [jspcal2014]
  • re.findall("'(.*?)’", current_test_string) [akhilmd2016]

The current_test_string.split(""") approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.

References:

Answered By: Giovanni

For me the only regex that ever worked right for all the cases of quoted strings with possibly escaped quotes inside of them was:

regex=r"""(['"])(?:\\|\1|[^1])*?1"""

This will not fail even if the quoted string ends with an escaped backslash.

Answered By: Frank
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.