REGEX Match number in a line with a keyword

Question

I tried many patterns, but cannot get the correct result. I want to match only float when the line has keyword range at the beginning. My trouble is that the range can follow by a :, : , :, :, : , etc.

My best try is to use two patterns:

#1. (?i)(?<=range[: ])[:a-zA-Z0-9.$ -]+

#2. [0-9.]+

First run regex with the pattern #1, then get the ouput of pattern #1 and run regex one more time with pattern #2

How can I do that in one single pattern? Thanks so much

One more thing: my code is Python

Input:
range: $0.82
–> Expected output: 0.82

Input:
range:0.82
–> Expected output: 0.82

Input:
range: 0.82 - 0.85
–> Expected output: 0.82, 0.85

Input:
range : 0.82 - 0.85
–> Expected output: 0.82, 0.85

Input:
range 0.82 0.85
–> Expected output: 0.82, 0.85

Asked By: Triho

||

Source

Answer 1

This seems to work for me – however – there are probably a number of more efficient ways of doing it:

import re

input_data = ['range: $0.82',
              'range:0.82',
              'range:  0.82 - 0.85',
              'range : 0.82 - 0.85',
              'range   :  0.82 - 0.85',
              'range 0.82   0.85']

for i in range(len(input_data)):
    output = re.findall(r'(range)(s*:?s*[$]*)([0-9]*.[0-9]*)(s*-?s*)([0-9]*.[0-9]*)?', input_data[i])
    a = list(output[0])[2]
    b = list(output[0])[4]
    print(f'Input: {input_data[i]} --> Expected output: {a} , {b}')

OUTPUT:

Input: range: $0.82 --> Expected output: 0.82 , 
Input: range:0.82 --> Expected output: 0.82 , 
Input: range:  0.82 - 0.85 --> Expected output: 0.82 , 0.85
Input: range : 0.82 - 0.85 --> Expected output: 0.82 , 0.85
Input: range   :  0.82 - 0.85 --> Expected output: 0.82 , 0.85
Input: range 0.82   0.85 --> Expected output: 0.82 , 0.85

You could also add some IF-statements to check to see if ‘b’ is empty, and control the output as required. However, I think the main thing that you wanted to achieve was a single REGEX statement that could extract the two numbers in question (if available).

Regex statement explanation:

r'(range)(s*:?s*[$]*)([0-9]*.[0-9]*)(s*-?s*)([0-9]*.[0-9]*)?'

First Group: (range)

This puts ‘range‘ into the first group.

Second Group: (s*:?s*[$]*)

s* matches zero or more whitespace characters
:? matches an optional colon (:)
[$]* matches zero or more dollar signs ($)

Third Group: ([0-9]*.[0-9]*)

[0-9]* matches zero or more numbers
. matches a decimal point
this is the group that relates to the number (0.82)

Fourth Group: (s*-?s*)

s* matches zero or more whitespace characters
-? matches an optional hyphen

Fifth Group: ([0-9]*.[0-9]*)?

[0-9]* matches zero or more numbers
. matches a decimal point
The ? at the end suggests that the group is optional.
This is the group that holds the second number (0.85)

Answered By: ScottC

Answer 2

You could avoid regex completely. Those lines are not difficult to parse.

def parse(line):
    if not line.startswith('range'):
        return
    line = line.replace(':',' ').replace('$','')
    for token in line.split():
        try:
            yield float(token)
        except ValueError:
            continue
            

input_data = ['range: $0.82',
              'range:0.82',
              'range:  0.82 - 0.85',
              'range : 0.82 - 0.85',
              'range   :  0.82 - 0.85',
              'range 0.82   0.85']

r = [list(i) for i in map(parse, input_data)]
print(r)
[[0.82], [0.82], [0.82, 0.85], [0.82, 0.85], [0.82, 0.85], [0.82, 0.85]]

Answered By: alec_djinn

Answer 3

You could use this regex to extract your data:

^s*rangeD*(d+(?:.d+)?)(?:D*(d+(?:.d+)?))?

Regex explanation:

^ : beginning of string
s*range : asserts the string starts with range (possibly preceded by whitespace, if you don’t want that remove the s*
D* : some number of non-digit characters
(d+(?:.d+)?) : a number, captured in group 1
(?:D*(d+(?:.d+)?))? an optional group of some non-digits followed by a number, captured in group 2

In python

import re

input_data = ['range: $0.82',
              'range:0.82',
              'range:  0.82 - 0.85',
              'range : 0.82 - 0.85',
              'range   :  0.82 - 0.85',
              'range 0.82   0.85']
results = [re.findall(r'^s*rangeD*(d+(?:.d+)?)(?:D*(d+(?:.d+)?))?', d)[0] for d in input_data]
print(results)

Output:

[
 ('0.82', ''),
 ('0.82', ''),
 ('0.82', '0.85'),
 ('0.82', '0.85'),
 ('0.82', '0.85'),
 ('0.82', '0.85')
]

Answered By: Nick

Answer 4

If you can make use of the Pythonregex PyPi module Then you can get multiple occurrences:

(?<=^rangeb[s:$-d.]*)d+(?:.d+)?

Explanation

(?<= Positive lookbehind, assert that to the left is
- ^rangeb Match range at the start of the string
- [s:$-d.]* Optionally match all allowed chars that could be in between
) Close the lookbehind assertion
d+(?:.d+)? Match 1+ digits with an optional decimal part

Regex demo | Python demo

Example

import regex

strings = [
"range: $0.82",
"range:0.82",
"range:  0.82 - 0.85",
"range : 0.82 - 0.85",
"range   :  0.82 - 0.85",
"range 0.82   0.85"
]
pattern = r"(?<=^rangeb[s:$-d.]*)d+(?:.d+)?"

for s in strings:
    print (regex.findall(pattern, s))

Output

['0.82']
['0.82']
['0.82', '0.85']
['0.82', '0.85']
['0.82', '0.85']
['0.82', '0.85']

Answered By: The fourth bird

REGEX Match number in a line with a keyword

Question:

Answers: