Python regex search and identify the first occurence
Question:
I have a sql string and I need to identify the first occurrence of the database and table name from the sql.
sql = 'select col1, col2, "base" as db_name, "employee" as table_name from base.employee where id is not NULL union select col1, col2, "base" as db_name, "employee" as table_name from base.employee where ts is not NULL'
result_dbname = re.search(',?"(.*)as db_name', sql)
db_name = result_dbname.group(1).replace(""", "")
print(db_name)
Expected result
base
Actual Result
base as db_name, employee as table_name from base.employee where id is not NULL union select col1, col2, base
I would like to capture only the first occurrence
Answers:
It’s not super pretty, but you could use:
re.findall(',?"w+" as db_name', sql)[0].split('"')[1]
This just splits the string into three strings:
>>> '"base" as db_name'.split('"')
['', 'base', ' as db_name']
So you just take the first index which in this case is base
.
You can try to use match group:
m = re.match(".*(".*") as db_name, (".*") as table_name.*", sql)
m.groups()
# ('"base"', '"employee"')
Then you can strip the quotation marks.
If you want the first occurrence, you can start with a non greedy quantifier.
Then use 2 capture groups with a negated character class to not cross matching the double quotes, and just capture what is in between the double quotes.
^.*?"([^"]*)" as db_name, "([^"]*)" as table_nameb
import re
pattern = r'^.*?"([^"]*)" as db_name, "([^"]*)" as table_nameb'
s = "select col1, col2, "base" as db_name, "employee" as table_name from base.employee where id is not NULL union select col1, col2, "base" as db_name, "employee" as table_name from base.employee where ts is not NULL"
m = re.match(pattern, s)
if m:
print(m.groups())
Output
('base', 'employee')
I have a sql string and I need to identify the first occurrence of the database and table name from the sql.
sql = 'select col1, col2, "base" as db_name, "employee" as table_name from base.employee where id is not NULL union select col1, col2, "base" as db_name, "employee" as table_name from base.employee where ts is not NULL'
result_dbname = re.search(',?"(.*)as db_name', sql)
db_name = result_dbname.group(1).replace(""", "")
print(db_name)
Expected result
base
Actual Result
base as db_name, employee as table_name from base.employee where id is not NULL union select col1, col2, base
I would like to capture only the first occurrence
It’s not super pretty, but you could use:
re.findall(',?"w+" as db_name', sql)[0].split('"')[1]
This just splits the string into three strings:
>>> '"base" as db_name'.split('"')
['', 'base', ' as db_name']
So you just take the first index which in this case is base
.
You can try to use match group:
m = re.match(".*(".*") as db_name, (".*") as table_name.*", sql)
m.groups()
# ('"base"', '"employee"')
Then you can strip the quotation marks.
If you want the first occurrence, you can start with a non greedy quantifier.
Then use 2 capture groups with a negated character class to not cross matching the double quotes, and just capture what is in between the double quotes.
^.*?"([^"]*)" as db_name, "([^"]*)" as table_nameb
import re
pattern = r'^.*?"([^"]*)" as db_name, "([^"]*)" as table_nameb'
s = "select col1, col2, "base" as db_name, "employee" as table_name from base.employee where id is not NULL union select col1, col2, "base" as db_name, "employee" as table_name from base.employee where ts is not NULL"
m = re.match(pattern, s)
if m:
print(m.groups())
Output
('base', 'employee')