Excluding certain string using regex in python

Question

I would like to apply regex to the below code such that I remove any string that appears between a comma and the word ‘AS’.

Select customer_name, customer_type, COUNT(*) AS volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10

Expected output:

Select customer_name, customer_type, volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10

I tried the below but that did not give the desired output

result = re.sub(r",s*COUNT(*)s*ASs*w+", "", text)

Asked By: Kevin Nash

||

Source

Answer 1

I would use:

text = "Select customer_name, customer_type, COUNT(*) AS volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10"
result = re.sub(r',s*S+s+ASbs*', ', ', text)
print(result)

This prints:

Select customer_name, customer_type, volume
FROM table
GROUP BY customer_name, customer_type
ORDER BY volume DESC
LIMIT 10

The regex pattern used here says to match:

, a comma
s* optional whitespace
S+ a non whitespace term
s+ one or more whitespace characters
AS literal "AS"
b word boundary
s* more optional whitespace

Answered By: Tim Biegeleisen

Answer 2

You could use a capture group and use the group in the replacement.

(,s*)[^,]*sASbs*

Explanation

(,s*) Capture group 1, match a comma and optional whitespace chars
[^,]* Match any char except a comma
sASbs* Match a whitespace char, then AS followed by optional spaces

Regex demo | Python demo

import re
 
pattern = r"(,s*)[^,]*sASbs*"
s = ("Select customer_name, customer_type, COUNT(*) AS volume\nFROM table\nGROUP BY customer_name, customer_type\nORDER BY volume DESC\nLIMIT 10n")
 
print(re.sub(pattern, r"1", s))

Output

Select customer_name, customer_type, volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10

Answered By: The fourth bird

Excluding certain string using regex in python

Question:

Answers: