Extract string before specific character
Question:
i have list of images , which are named as follow :
Abyssinian_1.jpg
so name_digit.jpg
of course if it would be only one _digit.jpg, then using split statement it is very easy,but whe have also image named as follow
Egyptian_Mau_214.jpg
sure we should extract text before _digit, but i am not sure which regular expression method can i use for it?i will demonstrate one example
let us suppose we have following string :
name ='Egyptian_Mau_214.jpg'
if i use split statement
print(name.split("_"))
then result would be : ['Egyptian', 'Mau', '214.jpg']
but i want to split before the _214, so how to do it?
Answers:
would this work:
name ='Egyptian_Mau_214.jpg'
m = re.split(r'_(?=d+.[a-z]+$)', name)
print(m) # ['Egyptian_Mau', '214.jpg']
Hope I got it right, but you could use a regex.
For your example:
import re
r = re.compile('(.+)_(d+.jpg)')
m = r.match('Egyptian_Mau_214.jpg')
print(m.groups()) # -> ('Egyptian_Mau', '214.jpg')
Regex explanation:
(.+)
– Group 1, 1 or more of any character.
_
– Just an underscore.
(d+.jpg)
– Group 2, one or more digits, and .jpg
suffix.
i have list of images , which are named as follow :
Abyssinian_1.jpg
so name_digit.jpg
of course if it would be only one _digit.jpg, then using split statement it is very easy,but whe have also image named as follow
Egyptian_Mau_214.jpg
sure we should extract text before _digit, but i am not sure which regular expression method can i use for it?i will demonstrate one example
let us suppose we have following string :
name ='Egyptian_Mau_214.jpg'
if i use split statement
print(name.split("_"))
then result would be : ['Egyptian', 'Mau', '214.jpg']
but i want to split before the _214, so how to do it?
would this work:
name ='Egyptian_Mau_214.jpg'
m = re.split(r'_(?=d+.[a-z]+$)', name)
print(m) # ['Egyptian_Mau', '214.jpg']
Hope I got it right, but you could use a regex.
For your example:
import re
r = re.compile('(.+)_(d+.jpg)')
m = r.match('Egyptian_Mau_214.jpg')
print(m.groups()) # -> ('Egyptian_Mau', '214.jpg')
Regex explanation:
(.+)
– Group 1, 1 or more of any character._
– Just an underscore.(d+.jpg)
– Group 2, one or more digits, and.jpg
suffix.