Excluding certain string using regex in python

Question:

I would like to apply regex to the below code such that I remove any string that appears between a comma and the word ‘AS’.

Select customer_name, customer_type, COUNT(*) AS volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10

Expected output:

Select customer_name, customer_type, volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10

I tried the below but that did not give the desired output

result = re.sub(r",s*COUNT(*)s*ASs*w+", "", text)
Asked By: Kevin Nash

||

Answers:

I would use:

text = "Select customer_name, customer_type, COUNT(*) AS volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10"
result = re.sub(r',s*S+s+ASbs*', ', ', text)
print(result)

This prints:

Select customer_name, customer_type, volume
FROM table
GROUP BY customer_name, customer_type
ORDER BY volume DESC
LIMIT 10

The regex pattern used here says to match:

  • , a comma
  • s* optional whitespace
  • S+ a non whitespace term
  • s+ one or more whitespace characters
  • AS literal "AS"
  • b word boundary
  • s* more optional whitespace
Answered By: Tim Biegeleisen

You could use a capture group and use the group in the replacement.

(,s*)[^,]*sASbs*

Explanation

  • (,s*) Capture group 1, match a comma and optional whitespace chars
  • [^,]* Match any char except a comma
  • sASbs* Match a whitespace char, then AS followed by optional spaces

Regex demo | Python demo

import re
 
pattern = r"(,s*)[^,]*sASbs*"
s = ("Select customer_name, customer_type, COUNT(*) AS volume\nFROM table\nGROUP BY customer_name, customer_type\nORDER BY volume DESC\nLIMIT 10n")
 
print(re.sub(pattern, r"1", s))

Output

Select customer_name, customer_type, volumenFROM tablenGROUP BY customer_name, customer_typenORDER BY volume DESCnLIMIT 10
Answered By: The fourth bird
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.