Excluding certain string using regex in python
Question:
I would like to apply regex to the below code such that I remove any string that appears between a comma and the word ‘AS’.
Select customer_name, customer_type, COUNT(*) AS volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10
Expected output:
Select customer_name, customer_type, volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10
I tried the below but that did not give the desired output
result = re.sub(r",s*COUNT(*)s*ASs*w+", "", text)
Answers:
I would use:
text = "Select customer_name, customer_type, COUNT(*) AS volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10"
result = re.sub(r',s*S+s+ASbs*', ', ', text)
print(result)
This prints:
Select customer_name, customer_type, volume
FROM table
GROUP BY customer_name, customer_type
ORDER BY volume DESC
LIMIT 10
The regex pattern used here says to match:
,
a comma
s*
optional whitespace
S+
a non whitespace term
s+
one or more whitespace characters
AS
literal "AS"
b
word boundary
s*
more optional whitespace
You could use a capture group and use the group in the replacement.
(,s*)[^,]*sASbs*
Explanation
(,s*)
Capture group 1, match a comma and optional whitespace chars
[^,]*
Match any char except a comma
sASbs*
Match a whitespace char, then AS
followed by optional spaces
import re
pattern = r"(,s*)[^,]*sASbs*"
s = ("Select customer_name, customer_type, COUNT(*) AS volume\nFROM table\nGROUP BY customer_name, customer_type\nORDER BY volume DESC\nLIMIT 10n")
print(re.sub(pattern, r"1", s))
Output
Select customer_name, customer_type, volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10
I would like to apply regex to the below code such that I remove any string that appears between a comma and the word ‘AS’.
Select customer_name, customer_type, COUNT(*) AS volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10
Expected output:
Select customer_name, customer_type, volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10
I tried the below but that did not give the desired output
result = re.sub(r",s*COUNT(*)s*ASs*w+", "", text)
I would use:
text = "Select customer_name, customer_type, COUNT(*) AS volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10"
result = re.sub(r',s*S+s+ASbs*', ', ', text)
print(result)
This prints:
Select customer_name, customer_type, volume
FROM table
GROUP BY customer_name, customer_type
ORDER BY volume DESC
LIMIT 10
The regex pattern used here says to match:
,
a commas*
optional whitespaceS+
a non whitespace terms+
one or more whitespace charactersAS
literal "AS"b
word boundarys*
more optional whitespace
You could use a capture group and use the group in the replacement.
(,s*)[^,]*sASbs*
Explanation
(,s*)
Capture group 1, match a comma and optional whitespace chars[^,]*
Match any char except a commasASbs*
Match a whitespace char, thenAS
followed by optional spaces
import re
pattern = r"(,s*)[^,]*sASbs*"
s = ("Select customer_name, customer_type, COUNT(*) AS volume\nFROM table\nGROUP BY customer_name, customer_type\nORDER BY volume DESC\nLIMIT 10n")
print(re.sub(pattern, r"1", s))
Output
Select customer_name, customer_type, volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10