python regex relace for wildcard
Question:
I am trying to apply regex on python for following code.
Country_name = "usa_t1_usq_t1_[0-9]*.csv"
new_result = re.sub(r'(?:_[[0-9-]+].*[a-zA-Z])+', '', Country_name)
# Display the Content
print(new_result)
The problem here is its working for above input, but not working for input without [0-9] pattern (3rd input in below example).
for example:
input – usa_t1_usq_t1_[0-9]*.csv Expected output – usa_t1_usq_t1
input – usa_t1_usq_t1_[0-9]*.gzip.csv Expected output – usa_t1_usq_t1
input – usa_t1_usq_t1.gzip.csv Expected output – usa_t1_usq_t1
can someone help me to make proper regex for the above scenario as I am new to regex world ?
Answers:
IIUC,
inputs = ['usa_t1_usq_t1_[0-9]*.csv', 'usa_t1_usq_t1_[0-9]*.gzip.csv', 'usa_t1_usq_t1.gzip.csv']
for Country_name in inputs:
result = re.sub('(_[0-9]*)?(.[a-zA-Z]+)+', '', Country_name)
print(result)
# usa_t1_usq_t1
# usa_t1_usq_t1
# usa_t1_usq_t1
(_[0-9]*)
matches the plain string _[0-9]*
in Country_name
, and ?
after this means it appears zero or one times.
(.[a-zA-Z]+)
matches the suffix starting with .
, and another +
means it may appear more than once.
Instead of using re.sub to match what you want to remove, you can also match the pattern and capture what you want in group 1.
^(w+)(?:_[0-9]*)?.[a-z]
Explanation
^
Start of string
(w+)
Capture 1+ word chars in group 1
(?:_[0-9]*)?
optionally match _[0-9]*
.[a-z]
Match a .
and a char a-z
import re
strings = ['usa_t1_usq_t1_[0-9]*.csv', 'usa_t1_usq_t1_[0-9]*.gzip.csv', 'usa_t1_usq_t1.gzip.csv']
pattern = re.compile("^(w+)(?:_[0-9]*)?.[a-z]", re.IGNORECASE)
for Country_name in strings:
m = pattern.match(Country_name)
if m:
print(m.group(1))
Output
usa_t1_usq_t1
usa_t1_usq_t1
usa_t1_usq_t1
I am trying to apply regex on python for following code.
Country_name = "usa_t1_usq_t1_[0-9]*.csv"
new_result = re.sub(r'(?:_[[0-9-]+].*[a-zA-Z])+', '', Country_name)
# Display the Content
print(new_result)
The problem here is its working for above input, but not working for input without [0-9] pattern (3rd input in below example).
for example:
input – usa_t1_usq_t1_[0-9]*.csv Expected output – usa_t1_usq_t1
input – usa_t1_usq_t1_[0-9]*.gzip.csv Expected output – usa_t1_usq_t1
input – usa_t1_usq_t1.gzip.csv Expected output – usa_t1_usq_t1
can someone help me to make proper regex for the above scenario as I am new to regex world ?
IIUC,
inputs = ['usa_t1_usq_t1_[0-9]*.csv', 'usa_t1_usq_t1_[0-9]*.gzip.csv', 'usa_t1_usq_t1.gzip.csv']
for Country_name in inputs:
result = re.sub('(_[0-9]*)?(.[a-zA-Z]+)+', '', Country_name)
print(result)
# usa_t1_usq_t1
# usa_t1_usq_t1
# usa_t1_usq_t1
(_[0-9]*)
matches the plain string _[0-9]*
in Country_name
, and ?
after this means it appears zero or one times.
(.[a-zA-Z]+)
matches the suffix starting with .
, and another +
means it may appear more than once.
Instead of using re.sub to match what you want to remove, you can also match the pattern and capture what you want in group 1.
^(w+)(?:_[0-9]*)?.[a-z]
Explanation
^
Start of string(w+)
Capture 1+ word chars in group 1(?:_[0-9]*)?
optionally match_[0-9]*
.[a-z]
Match a.
and a char a-z
import re
strings = ['usa_t1_usq_t1_[0-9]*.csv', 'usa_t1_usq_t1_[0-9]*.gzip.csv', 'usa_t1_usq_t1.gzip.csv']
pattern = re.compile("^(w+)(?:_[0-9]*)?.[a-z]", re.IGNORECASE)
for Country_name in strings:
m = pattern.match(Country_name)
if m:
print(m.group(1))
Output
usa_t1_usq_t1
usa_t1_usq_t1
usa_t1_usq_t1