Python Regex to find a string in double quotes within a string
Question:
I’m looking for a code in python using regex that can perform something like this
Input: Regex should return "String 1" or "String 2" or "String3"
Output: String 1,String2,String3
I tried r'"*"'
Answers:
Here’s all you need to do:
def doit(text):
import re
matches = re.findall(r'"(.+?)"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
doit('Regex should return "String 1" or "String 2" or "String3" ')
result:
'String 1,String 2,String3'
As pointed out by Li-aung Yip:
To elaborate, .+?
is the "non-greedy" version of .+
. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+
, will give String 1" or "String 2" or "String 3
; the non-greedy version .+?
gives String 1
, String 2
, String 3
.
In addition, if you want to accept empty strings, change .+
to .*
. Star *
means zero or more while plus +
means at least one.
Just try to fetch double quoted strings from the multiline string:
import re
s = """
"my name is daniel" "mobile 8531111453733"[[[[[[--"i like pandas"
"location chennai"! -asfas"aadhaar du2mmy8969769##69869"
@4343453 "pincode 642002""@mango,@apple,@berry"
"""
print(re.findall(r'"(.*?)"', s))
import re
r=r"'(\'|[^'])*(?!<\)'|"(\"|[^"])*(?!<\)""
texts=[r'"aerrrt"',
r'"a"e'+"'"+'rrt"',
r'"a""""arrtt"""""',
r'"aerrrt',
r'"a"errt'+"'",
r"'aerrrt'",
r"'a'e"+'"'+"rrt'",
r"'a''''arrtt'''''",
r"'aerrrt",
r"'a'errt"+'"',
"''",'""',""]
for text in texts:
print (text,"-->",re.fullmatch(r,text))
results:
"aerrrt" --> <_sre.SRE_Match object; span=(0, 8), match='"aerrrt"'>
"a"e'rrt" --> <_sre.SRE_Match object; span=(0, 10), match='"a\"e'rrt"'>
"a""""arrtt""""" --> None
"aerrrt --> None
"a"errt' --> None
'aerrrt' --> <_sre.SRE_Match object; span=(0, 8), match="'aerrrt'">
'a'e"rrt' --> <_sre.SRE_Match object; span=(0, 10), match=''a\'e"rrt''>
'a''''arrtt''''' --> None
'aerrrt --> None
'a'errt" --> None
'' --> <_sre.SRE_Match object; span=(0, 2), match="''">
"" --> <_sre.SRE_Match object; span=(0, 2), match='""'>
--> None
The highly up-voted answer doesn’t account for the possibility that the double-quoted string might contain one or more double-quote characters (properly escaped, of course). To handle this situation, the regex needs to accumulate characters one-by-one with a positive lookahead assertion stating that the current character is not a double-quote character that is not preceded by a backslash (which requires a negative lookbehind assertion):
"(?:(?:(?!(?<!\)").)*)"
import re
import ast
def doit(text):
matches=re.findall(r'"(?:(?:(?!(?<!\)").)*)"',text)
for match in matches:
print(match, '=>', ast.literal_eval(match))
doit('Regex should return "String 1" or "String 2" or "String3" and "\"double quoted string\"" ')
Prints:
"String 1" => String 1
"String 2" => String 2
"String3" => String3
""double quoted string"" => "double quoted string"
From https://stackoverflow.com/a/69891301/1531728
My solution is:
import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw f "first" +&%#$%"second",vwrfhir, d2e u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due" "tre"fef fre f', ' "uno""dos" "tres"', '"unu""doua""trei"', ' "um" "dois" "tres" ']
my_substrings = []
for current_test_string in my_strings:
for values in re.findall(r'"(.+?)"', current_test_string):
my_substrings.append(values)
#print("values are:",values,"=")
print(" my_substrings are:",my_substrings,"=")
my_substrings = []
Alternate regular expressions to use are:
- re.findall(‘"(.+?)"’, current_test_string) [Avinash2021] [user17405772021]
- re.findall(‘"(.*?)"’, current_test_string) [Shelvington2020]
- re.findall(r’"(.*?)"’, current_test_string) [Lundberg2012] [Avinash2021]
- re.findall(r’"(.+?)"’, current_test_string) [Lundberg2012] [Avinash2021]
- re.findall(r’"["]’, current_test_string) [Muthupandi2019]
- re.findall(r’"([^"]*)"’, current_test_string) [Pieters2014]
- re.findall(r’"(?:(?:(?!(?<!)").)*)"’, current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Booboo2020]
- re.findall(r’"(.*?)(?<!)"’, current_test_string) [Hassan2014]
- re.findall(‘"[^"]*"’, current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Martelli2013]
- re.findall(‘"([^"]*)"’, current_test_string) [jspcal2014]
- re.findall("'(.*?)’", current_test_string) [akhilmd2016]
The current_test_string.split(""")
approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.
References:
- [Avinash2021] Arvind Kumar Avinash, Answer to “Extract text between quotation using regex python”, Stack Exchange, Inc., New York, NY, October 12, 2021. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/69543129/1531728 and Extract text between quotation using regex python November 8, 2021 was the last accessed date.
- [user17405772021] user1740577, Answer to “Extract text between quotation using regex python”, Stack Exchange, Inc., New York, NY, October 12, 2021. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/69543030/1531728 and Extract text between quotation using regex python November 8, 2021 was the last accessed date.
- [Shelvington2020] Iain Shelvington, Answer to “Extracting only words out of a mixed string in Python [duplicate]”, Stack Exchange, Inc., New York, NY, January 5, 2020. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/59598630/1531728 and Extracting only words out of a mixed string in Python November 6, 2021 was the last accessed date.
- [Lundberg2012] Johan Lundberg, Answer to “Python Regex to find a string in double quotes within a string”, Stack Exchange, Inc., New York, NY, March 1, 2012. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/9519934/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Muthupandi2019] Daniel Muthupandi and trotta, Answer to “Python Regex to find a string in double quotes within a string”, Stack Exchange, Inc., New York, NY, August 3, 2019. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/57337020/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Booboo2020] Booboo, Answer to “Python Regex to find a string in double quotes within a string”, Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/63707053/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Pieters2014] Martijn Pieters, Answer to “Extract a string between double quotes”, Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/22735466/1531728 and Extract a string between double quotes November 6, 2021 was the last accessed date.
- [Hassan2014] Sabuj Hassan, Answer to “Extract a string between double quotes”, Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/22735480/1531728 and Extract a string between double quotes November 6, 2021 was the last accessed date.
- [Martelli2013] Alex Martelli and Sumit Singh, Answer to "Extract string from between quotations", Stack Exchange Inc., New York, NY, March 14, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/2076357/1531728 and Extract string from between quotations November 6, 2021 was the last accessed date.
- [jspcal2014] jspcal, Answer to "Extract string from between quotations", Stack Exchange Inc., New York, NY, March 14, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/2076356/1531728 and Extract string from between quotations November 6, 2021 was the last accessed date.
- [akhilmd2016] akhilmd, Answer to "Stripping string in python between quotes", Stack Exchange Inc., New York, NY, July 2, 2016. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/38161072/1531728 and ; November 5, 2021 was the last accessed date.
For me the only regex that ever worked right for all the cases of quoted strings with possibly escaped quotes inside of them was:
regex=r"""(['"])(?:\\|\1|[^1])*?1"""
This will not fail even if the quoted string ends with an escaped backslash.
I’m looking for a code in python using regex that can perform something like this
Input: Regex should return "String 1" or "String 2" or "String3"
Output: String 1,String2,String3
I tried r'"*"'
Here’s all you need to do:
def doit(text):
import re
matches = re.findall(r'"(.+?)"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
doit('Regex should return "String 1" or "String 2" or "String3" ')
result:
'String 1,String 2,String3'
As pointed out by Li-aung Yip:
To elaborate,
.+?
is the "non-greedy" version of.+
. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version,.+
, will giveString 1" or "String 2" or "String 3
; the non-greedy version.+?
givesString 1
,String 2
,String 3
.
In addition, if you want to accept empty strings, change .+
to .*
. Star *
means zero or more while plus +
means at least one.
Just try to fetch double quoted strings from the multiline string:
import re
s = """
"my name is daniel" "mobile 8531111453733"[[[[[[--"i like pandas"
"location chennai"! -asfas"aadhaar du2mmy8969769##69869"
@4343453 "pincode 642002""@mango,@apple,@berry"
"""
print(re.findall(r'"(.*?)"', s))
import re
r=r"'(\'|[^'])*(?!<\)'|"(\"|[^"])*(?!<\)""
texts=[r'"aerrrt"',
r'"a"e'+"'"+'rrt"',
r'"a""""arrtt"""""',
r'"aerrrt',
r'"a"errt'+"'",
r"'aerrrt'",
r"'a'e"+'"'+"rrt'",
r"'a''''arrtt'''''",
r"'aerrrt",
r"'a'errt"+'"',
"''",'""',""]
for text in texts:
print (text,"-->",re.fullmatch(r,text))
results:
"aerrrt" --> <_sre.SRE_Match object; span=(0, 8), match='"aerrrt"'>
"a"e'rrt" --> <_sre.SRE_Match object; span=(0, 10), match='"a\"e'rrt"'>
"a""""arrtt""""" --> None
"aerrrt --> None
"a"errt' --> None
'aerrrt' --> <_sre.SRE_Match object; span=(0, 8), match="'aerrrt'">
'a'e"rrt' --> <_sre.SRE_Match object; span=(0, 10), match=''a\'e"rrt''>
'a''''arrtt''''' --> None
'aerrrt --> None
'a'errt" --> None
'' --> <_sre.SRE_Match object; span=(0, 2), match="''">
"" --> <_sre.SRE_Match object; span=(0, 2), match='""'>
--> None
The highly up-voted answer doesn’t account for the possibility that the double-quoted string might contain one or more double-quote characters (properly escaped, of course). To handle this situation, the regex needs to accumulate characters one-by-one with a positive lookahead assertion stating that the current character is not a double-quote character that is not preceded by a backslash (which requires a negative lookbehind assertion):
"(?:(?:(?!(?<!\)").)*)"
import re
import ast
def doit(text):
matches=re.findall(r'"(?:(?:(?!(?<!\)").)*)"',text)
for match in matches:
print(match, '=>', ast.literal_eval(match))
doit('Regex should return "String 1" or "String 2" or "String3" and "\"double quoted string\"" ')
Prints:
"String 1" => String 1
"String 2" => String 2
"String3" => String3
""double quoted string"" => "double quoted string"
From https://stackoverflow.com/a/69891301/1531728
My solution is:
import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw f "first" +&%#$%"second",vwrfhir, d2e u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due" "tre"fef fre f', ' "uno""dos" "tres"', '"unu""doua""trei"', ' "um" "dois" "tres" ']
my_substrings = []
for current_test_string in my_strings:
for values in re.findall(r'"(.+?)"', current_test_string):
my_substrings.append(values)
#print("values are:",values,"=")
print(" my_substrings are:",my_substrings,"=")
my_substrings = []
Alternate regular expressions to use are:
- re.findall(‘"(.+?)"’, current_test_string) [Avinash2021] [user17405772021]
- re.findall(‘"(.*?)"’, current_test_string) [Shelvington2020]
- re.findall(r’"(.*?)"’, current_test_string) [Lundberg2012] [Avinash2021]
- re.findall(r’"(.+?)"’, current_test_string) [Lundberg2012] [Avinash2021]
- re.findall(r’"["]’, current_test_string) [Muthupandi2019]
- re.findall(r’"([^"]*)"’, current_test_string) [Pieters2014]
- re.findall(r’"(?:(?:(?!(?<!)").)*)"’, current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Booboo2020]
- re.findall(r’"(.*?)(?<!)"’, current_test_string) [Hassan2014]
- re.findall(‘"[^"]*"’, current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Martelli2013]
- re.findall(‘"([^"]*)"’, current_test_string) [jspcal2014]
- re.findall("'(.*?)’", current_test_string) [akhilmd2016]
The current_test_string.split(""")
approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.
References:
- [Avinash2021] Arvind Kumar Avinash, Answer to “Extract text between quotation using regex python”, Stack Exchange, Inc., New York, NY, October 12, 2021. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/69543129/1531728 and Extract text between quotation using regex python November 8, 2021 was the last accessed date.
- [user17405772021] user1740577, Answer to “Extract text between quotation using regex python”, Stack Exchange, Inc., New York, NY, October 12, 2021. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/69543030/1531728 and Extract text between quotation using regex python November 8, 2021 was the last accessed date.
- [Shelvington2020] Iain Shelvington, Answer to “Extracting only words out of a mixed string in Python [duplicate]”, Stack Exchange, Inc., New York, NY, January 5, 2020. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/59598630/1531728 and Extracting only words out of a mixed string in Python November 6, 2021 was the last accessed date.
- [Lundberg2012] Johan Lundberg, Answer to “Python Regex to find a string in double quotes within a string”, Stack Exchange, Inc., New York, NY, March 1, 2012. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/9519934/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Muthupandi2019] Daniel Muthupandi and trotta, Answer to “Python Regex to find a string in double quotes within a string”, Stack Exchange, Inc., New York, NY, August 3, 2019. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/57337020/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Booboo2020] Booboo, Answer to “Python Regex to find a string in double quotes within a string”, Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/63707053/1531728 and Python Regex to find a string in double quotes within a string November 6, 2021 was the last accessed date.
- [Pieters2014] Martijn Pieters, Answer to “Extract a string between double quotes”, Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/22735466/1531728 and Extract a string between double quotes November 6, 2021 was the last accessed date.
- [Hassan2014] Sabuj Hassan, Answer to “Extract a string between double quotes”, Stack Exchange, Inc., New York, NY, March 29, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/22735480/1531728 and Extract a string between double quotes November 6, 2021 was the last accessed date.
- [Martelli2013] Alex Martelli and Sumit Singh, Answer to "Extract string from between quotations", Stack Exchange Inc., New York, NY, March 14, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/2076357/1531728 and Extract string from between quotations November 6, 2021 was the last accessed date.
- [jspcal2014] jspcal, Answer to "Extract string from between quotations", Stack Exchange Inc., New York, NY, March 14, 2014. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/2076356/1531728 and Extract string from between quotations November 6, 2021 was the last accessed date.
- [akhilmd2016] akhilmd, Answer to "Stripping string in python between quotes", Stack Exchange Inc., New York, NY, July 2, 2016. Available online from Stack Exchange Inc.: Stack Overflow: Questions at: https://stackoverflow.com/a/38161072/1531728 and ; November 5, 2021 was the last accessed date.
For me the only regex that ever worked right for all the cases of quoted strings with possibly escaped quotes inside of them was:
regex=r"""(['"])(?:\\|\1|[^1])*?1"""
This will not fail even if the quoted string ends with an escaped backslash.