Extracting specific string format of digits

Question

Let us suppose we have text like this :

text ="new notebook was sold 8 times before 13:30 in given shop"

here we have 3 number presented, one is single digit 8 and last two are two digit numbers, 13,30, main point is , that 13:30 express time, they are not just numbers , but they express information about hour and minute, so in order to make difference between 8 and 13:30, I want to return them as they are presented in string. For clarify problems if we just return 8, 13, 30 then it is not clear which one is hour, which one is minute and which one is just number

i have searched about regular expression tools and we can use for instance following line of codes:

import re
import string
text ="new notebook was sold  8 times before  13:30 in given shop"
x = re.findall("[0-5][0-9]", text)
y =re.findall("d+",text)
print(x,y)

The first one returns two digit numbers (in this case 13 and 30) and second one returns all numbers (8,13,3) but how to return as they are presented in the list? so answer should be [8,13:30]?

here is one link which contains information about all regular expression options :regular expression

let us take one of the answer :

x = re.findall(r'd+(?::d+)?', text)

here d+ means match one or more digits., then comes

(?::d+)?

? -means Zero or one occurrences,() is group option, for instance following syntax means

x = re.findall("falls|stays", txt)
#Check if the string contains either "falls" or "stays":

so this statement

 x = re.findall(r'd+(?::d+)?', text)

does it mean , that any digit following by one or : symbol and then following by digit again? what about 8?

Asked By: neural science

||

Source

Answer 1

import re

text = "new notebook was sold 8 times before 13:30 in given shop"

pattern = re.compile(r'b(?:d{1,2}:d{2}|d+)b')
matches = pattern.findall(text)

print(matches)
# ['8', '13:30']

What you need are capturing groups () and OR |.
All capturing groups are returned if either a number d+ or a time is matched.

For example, if you also want dates as well, just add it between the OR’s:

text = "new notebook was sold 8 times before 13:30 on 2023-03-04 in given shop"
pattern = re.compile(r'b(?:d{1,2}:d{2}|20d{2}-d{2}-d{2}|d+)b')

Answered By: Andreas

Answer 2

x = re.findall(r'd+(?::d+)?', text)

d+ one or more digits
(?: non-capturing group
? optional

Meaning digits optionally followed by a colon and digits.

Answered By: Emanuel P

Answer 3

r='([d]+)[D]+([d:{1}]+)'
y=re.search(r,text)
y.groups(1)

This will find >=1 number and save as a group, then >=1 non number, then >1 number plus a ‘:’ within, and save as another group.

Answered By: ilshatt

Extracting specific string format of digits

Question:

Answers: