Use Regex to exclude numbers based on certain conditions
Question:
I am trying to match and extract numbers if:
- They are not a single 2
- They are not a single 4
- They are not a 4-digit number
*Note: Placement of numbers in the string is completely random – the numbers can occur at the beginning, middle, or end and can be any length other than 4.
Here is a table with examples of strings and desired matches.
Text
Desired Match(es)
HELLO123
123
B4UGO
1984 ANIMAL FARM 45
45
GOT 2 GO
SOME OTHER 1000
22 AND 44 AND 1234567
22, 44, 1234567
TEST567TRUE
567
I found an SO article that begins to address the single 2 and single 4 issue here. The regex I have thus far is 'b(?!2b|4b|d{4})d+b'
, but that requires the numbers to be standalone (surrounded by spaces) and also will not extract numbers that have 4 digits, but exceed it (e.g. 1234567). I’d appreciate some help if anyone has some ideas.
Answers:
You could use negative lookarounds (?<!d)
and (?!d)
as boundaries:
(?<!d)(?!([24]|d{4})(?!d))d+
Inside the first negative lookahead disallowed numbers get alternated in a group.
I am trying to match and extract numbers if:
- They are not a single 2
- They are not a single 4
- They are not a 4-digit number
*Note: Placement of numbers in the string is completely random – the numbers can occur at the beginning, middle, or end and can be any length other than 4.
Here is a table with examples of strings and desired matches.
Text | Desired Match(es) |
---|---|
HELLO123 | 123 |
B4UGO | |
1984 ANIMAL FARM 45 | 45 |
GOT 2 GO | |
SOME OTHER 1000 | |
22 AND 44 AND 1234567 | 22, 44, 1234567 |
TEST567TRUE | 567 |
I found an SO article that begins to address the single 2 and single 4 issue here. The regex I have thus far is 'b(?!2b|4b|d{4})d+b'
, but that requires the numbers to be standalone (surrounded by spaces) and also will not extract numbers that have 4 digits, but exceed it (e.g. 1234567). I’d appreciate some help if anyone has some ideas.
You could use negative lookarounds (?<!d)
and (?!d)
as boundaries:
(?<!d)(?!([24]|d{4})(?!d))d+
Inside the first negative lookahead disallowed numbers get alternated in a group.