Extracting specific string format of digits
Question:
Let us suppose we have text like this :
text ="new notebook was sold 8 times before 13:30 in given shop"
here we have 3 number presented, one is single digit 8
and last two are two digit numbers, 13
,30
, main point is , that 13:30
express time, they are not just numbers , but they express information about hour and minute, so in order to make difference between 8
and 13:30
, I want to return them as they are presented in string. For clarify problems if we just return 8
, 13
, 30
then it is not clear which one is hour, which one is minute and which one is just number
i have searched about regular expression tools and we can use for instance following line of codes:
import re
import string
text ="new notebook was sold 8 times before 13:30 in given shop"
x = re.findall("[0-5][0-9]", text)
y =re.findall("d+",text)
print(x,y)
The first one returns two digit numbers (in this case 13
and 30
) and second one returns all numbers (8,13,3) but how to return as they are presented in the list? so answer should be [8,13:30]?
here is one link which contains information about all regular expression options :regular expression
let us take one of the answer :
x = re.findall(r'd+(?::d+)?', text)
here d+ means match one or more digits., then comes
(?::d+)?
? -means Zero or one occurrences,() is group option, for instance following syntax means
x = re.findall("falls|stays", txt)
#Check if the string contains either "falls" or "stays":
so this statement
x = re.findall(r'd+(?::d+)?', text)
does it mean , that any digit following by one or : symbol and then following by digit again? what about 8?
Answers:
import re
text = "new notebook was sold 8 times before 13:30 in given shop"
pattern = re.compile(r'b(?:d{1,2}:d{2}|d+)b')
matches = pattern.findall(text)
print(matches)
# ['8', '13:30']
What you need are capturing groups ()
and OR |
.
All capturing groups are returned if either a number d+
or a time is matched.
For example, if you also want dates as well, just add it between the OR’s:
text = "new notebook was sold 8 times before 13:30 on 2023-03-04 in given shop"
pattern = re.compile(r'b(?:d{1,2}:d{2}|20d{2}-d{2}-d{2}|d+)b')
x = re.findall(r'd+(?::d+)?', text)
d+
one or more digits
(?:
non-capturing group
?
optional
Meaning digits optionally followed by a colon and digits.
r='([d]+)[D]+([d:{1}]+)'
y=re.search(r,text)
y.groups(1)
This will find >=1 number and save as a group, then >=1 non number, then >1 number plus a ‘:’ within, and save as another group.
Let us suppose we have text like this :
text ="new notebook was sold 8 times before 13:30 in given shop"
here we have 3 number presented, one is single digit 8
and last two are two digit numbers, 13
,30
, main point is , that 13:30
express time, they are not just numbers , but they express information about hour and minute, so in order to make difference between 8
and 13:30
, I want to return them as they are presented in string. For clarify problems if we just return 8
, 13
, 30
then it is not clear which one is hour, which one is minute and which one is just number
i have searched about regular expression tools and we can use for instance following line of codes:
import re
import string
text ="new notebook was sold 8 times before 13:30 in given shop"
x = re.findall("[0-5][0-9]", text)
y =re.findall("d+",text)
print(x,y)
The first one returns two digit numbers (in this case 13
and 30
) and second one returns all numbers (8,13,3) but how to return as they are presented in the list? so answer should be [8,13:30]?
here is one link which contains information about all regular expression options :regular expression
let us take one of the answer :
x = re.findall(r'd+(?::d+)?', text)
here d+ means match one or more digits., then comes
(?::d+)?
? -means Zero or one occurrences,() is group option, for instance following syntax means
x = re.findall("falls|stays", txt)
#Check if the string contains either "falls" or "stays":
so this statement
x = re.findall(r'd+(?::d+)?', text)
does it mean , that any digit following by one or : symbol and then following by digit again? what about 8?
import re
text = "new notebook was sold 8 times before 13:30 in given shop"
pattern = re.compile(r'b(?:d{1,2}:d{2}|d+)b')
matches = pattern.findall(text)
print(matches)
# ['8', '13:30']
What you need are capturing groups ()
and OR |
.
All capturing groups are returned if either a number d+
or a time is matched.
For example, if you also want dates as well, just add it between the OR’s:
text = "new notebook was sold 8 times before 13:30 on 2023-03-04 in given shop"
pattern = re.compile(r'b(?:d{1,2}:d{2}|20d{2}-d{2}-d{2}|d+)b')
x = re.findall(r'd+(?::d+)?', text)
d+
one or more digits(?:
non-capturing group?
optional
Meaning digits optionally followed by a colon and digits.
r='([d]+)[D]+([d:{1}]+)'
y=re.search(r,text)
y.groups(1)
This will find >=1 number and save as a group, then >=1 non number, then >1 number plus a ‘:’ within, and save as another group.