python regex relace for wildcard

Question

I am trying to apply regex on python for following code.

Country_name = "usa_t1_usq_t1_[0-9]*.csv"
new_result = re.sub(r'(?:_[[0-9-]+].*[a-zA-Z])+', '', Country_name)

# Display the Content
print(new_result)

The problem here is its working for above input, but not working for input without [0-9] pattern (3rd input in below example).
for example:

input – usa_t1_usq_t1_[0-9]*.csv Expected output – usa_t1_usq_t1

input – usa_t1_usq_t1_[0-9]*.gzip.csv Expected output – usa_t1_usq_t1

input – usa_t1_usq_t1.gzip.csv Expected output – usa_t1_usq_t1

can someone help me to make proper regex for the above scenario as I am new to regex world ?

Asked By: BigD

||

Source

Answer 1

IIUC,

inputs = ['usa_t1_usq_t1_[0-9]*.csv', 'usa_t1_usq_t1_[0-9]*.gzip.csv', 'usa_t1_usq_t1.gzip.csv']
for Country_name in inputs:
    result = re.sub('(_[0-9]*)?(.[a-zA-Z]+)+', '', Country_name)
    print(result)
# usa_t1_usq_t1
# usa_t1_usq_t1
# usa_t1_usq_t1

(_[0-9]*) matches the plain string _[0-9]* in Country_name, and ? after this means it appears zero or one times.

(.[a-zA-Z]+) matches the suffix starting with ., and another + means it may appear more than once.

Answered By: ILS

Answer 2

Instead of using re.sub to match what you want to remove, you can also match the pattern and capture what you want in group 1.

^(w+)(?:_[0-9]*)?.[a-z]

Explanation

^ Start of string
(w+) Capture 1+ word chars in group 1
(?:_[0-9]*)? optionally match _[0-9]*
.[a-z] Match a . and a char a-z

Regex demo

import re

strings = ['usa_t1_usq_t1_[0-9]*.csv', 'usa_t1_usq_t1_[0-9]*.gzip.csv', 'usa_t1_usq_t1.gzip.csv']
pattern = re.compile("^(w+)(?:_[0-9]*)?.[a-z]", re.IGNORECASE)
for Country_name in strings:
    m = pattern.match(Country_name)
    if m:
        print(m.group(1))

Output

usa_t1_usq_t1
usa_t1_usq_t1
usa_t1_usq_t1

Answered By: The fourth bird

python regex relace for wildcard

Question:

Answers: