How to create regex to match a string that contains only hexadecimal numbers and arrows?
Question:
I am using a string that uses the following characters:
0-9
a-f
A-F
-
>
The mixture of the greater than and hyphen must be:
->
-->
Here is the regex that I have so far:
[0-9a-fA-F->]+
I tried these others using exclusion with ^
but they didn’t work:
[^g-zG-Z][0-9a-fA-F->]+
^g-zG-Z[0-9a-fA-F->]+
[0-9a-fA-F->]^g-zG-Z+
[0-9a-fA-F->]+^g-zG-Z
[0-9a-fA-F->]+[^g-zG-Z]
Here are some samples:
"0912adbd->12d1829-->218990d"
"ab2c8d-->82a921->193acd7"
Answers:
Firstly, you don’t need to escape -
and >
Here’s the regex that worked for me:
^([0-9a-fA-F]*(->)*(-->)*)*$
Here’s an alternative regex:
^([0-9a-fA-F]*(-+>)*)*$
What does the regex do?
^
matches the beginning of the string and $
matches the ending.
*
matches 0 or more instances of the preceding token
- Created a big
()
capturing group to match any token.
[0-9a-fA-F]
matches any character that is in the range.
(->)
and (-->)
match only those given instances.
Putting it into a code:
import re
regex = "^([0-9a-fA-F]*(->)*(-->)*)*$"
re.match(re.compile(regex),"0912adbd->12d1829-->218990d")
re.match(re.compile(regex),"ab2c8d-->82a921->193acd7")
re.match(re.compile(regex),"this-failed->so-->bad")
You can also convert it into a boolean:
print(bool(re.match(re.compile(regex),"0912adbd->12d1829-->218990d")))
print(bool(re.match(re.compile(regex),"ab2c8d-->82a921->193acd7")))
print(bool(re.match(re.compile(regex),"this-failed->so-->bad")))
Output:
True
True
False
I recommend using regexr.com to check your regex.
It’s probably more intuitive to re.split()
on the "arrows" and then check that all the resulting strings are purely hexadecimal:
import re
def check_hex_arrows(inp):
parts = re.split(r"-{1,2}>", inp)
return all(re.match(r"^[0-9a-fA-F]*$", p) for p in parts)
Testing:
test_strings = ["0912adbd->12d1829-->218990d",
"ab2c8d-->82a921->193acd7",
"0912adbd",
"0912adbd-12345",
"abcfdefghi",
"abcfdef----->123",
"abcfdefghi----->123" ]
for t in test_strings:
print(t, check_hex_arrows(t))
gives:
0912adbd->12d1829-->218990d True
ab2c8d-->82a921->193acd7 True
0912adbd True
0912adbd-12345 False
abcfdefghi False
abcfdef----->123 False
abcfdefghi----->123 False
If there must be an arrow present, and not at the start or end of the string using a case insensitive pattern:
^[a-fd]+(?:-{1,2}>[a-fd]+)+$
Explanation
^
Start of string
[a-fd]+
Match 1+ chars a-f or digits
(?:
Non capture group to repeat as a whole
-{1,2}>[a-fd]+
Match -
or --
and >
followed by 1+ chars a-f or digits
)+
Close the non capture group and repeat 1+ times
$
End of string
See a regex demo and a Python demo.
import re
pattern = r"^[a-fd]+(?:-{1,2}>[a-fd]+)+$"
s = ("0912adbd->12d1829-->218990dn"
"ab2c8d-->82a921->193acd7n"
"test")
print(re.findall(pattern, s, re.I | re.M))
Output
[
'0912adbd->12d1829-->218990d',
'ab2c8d-->82a921->193acd7'
]
You can construct the regex by steps. If I understand your requirements, you want a sequence of hexadecimal numbers (like a01d
or 11efeb23
, separated by arrows with one or two hyphens (->
or -->
).
The hex part’s regex is [0-9a-fA-F]+
(assuming it cannot be empty).
The arrow’s regex can be -{1,2}>
or (->|–>).
The arrow is only needed before each hex number but the first, so you’ll build the final regex in two parts: the first number, then the repetition of arrow and number.
So the general structure will be:
NUMBER(ARROW NUMBER)*
Which gives the following regex:
[0-9a-fA-F]+(-{1,2}>[0-9a-fA-F]+)*
I am using a string that uses the following characters:
0-9
a-f
A-F
-
>
The mixture of the greater than and hyphen must be:
->
-->
Here is the regex that I have so far:
[0-9a-fA-F->]+
I tried these others using exclusion with ^
but they didn’t work:
[^g-zG-Z][0-9a-fA-F->]+
^g-zG-Z[0-9a-fA-F->]+
[0-9a-fA-F->]^g-zG-Z+
[0-9a-fA-F->]+^g-zG-Z
[0-9a-fA-F->]+[^g-zG-Z]
Here are some samples:
"0912adbd->12d1829-->218990d"
"ab2c8d-->82a921->193acd7"
Firstly, you don’t need to escape -
and >
Here’s the regex that worked for me:
^([0-9a-fA-F]*(->)*(-->)*)*$
Here’s an alternative regex:
^([0-9a-fA-F]*(-+>)*)*$
What does the regex do?
^
matches the beginning of the string and$
matches the ending.*
matches 0 or more instances of the preceding token- Created a big
()
capturing group to match any token. [0-9a-fA-F]
matches any character that is in the range.(->)
and(-->)
match only those given instances.
Putting it into a code:
import re
regex = "^([0-9a-fA-F]*(->)*(-->)*)*$"
re.match(re.compile(regex),"0912adbd->12d1829-->218990d")
re.match(re.compile(regex),"ab2c8d-->82a921->193acd7")
re.match(re.compile(regex),"this-failed->so-->bad")
You can also convert it into a boolean:
print(bool(re.match(re.compile(regex),"0912adbd->12d1829-->218990d")))
print(bool(re.match(re.compile(regex),"ab2c8d-->82a921->193acd7")))
print(bool(re.match(re.compile(regex),"this-failed->so-->bad")))
Output:
True
True
False
I recommend using regexr.com to check your regex.
It’s probably more intuitive to re.split()
on the "arrows" and then check that all the resulting strings are purely hexadecimal:
import re
def check_hex_arrows(inp):
parts = re.split(r"-{1,2}>", inp)
return all(re.match(r"^[0-9a-fA-F]*$", p) for p in parts)
Testing:
test_strings = ["0912adbd->12d1829-->218990d",
"ab2c8d-->82a921->193acd7",
"0912adbd",
"0912adbd-12345",
"abcfdefghi",
"abcfdef----->123",
"abcfdefghi----->123" ]
for t in test_strings:
print(t, check_hex_arrows(t))
gives:
0912adbd->12d1829-->218990d True
ab2c8d-->82a921->193acd7 True
0912adbd True
0912adbd-12345 False
abcfdefghi False
abcfdef----->123 False
abcfdefghi----->123 False
If there must be an arrow present, and not at the start or end of the string using a case insensitive pattern:
^[a-fd]+(?:-{1,2}>[a-fd]+)+$
Explanation
^
Start of string[a-fd]+
Match 1+ chars a-f or digits(?:
Non capture group to repeat as a whole-{1,2}>[a-fd]+
Match-
or--
and>
followed by 1+ chars a-f or digits
)+
Close the non capture group and repeat 1+ times$
End of string
See a regex demo and a Python demo.
import re
pattern = r"^[a-fd]+(?:-{1,2}>[a-fd]+)+$"
s = ("0912adbd->12d1829-->218990dn"
"ab2c8d-->82a921->193acd7n"
"test")
print(re.findall(pattern, s, re.I | re.M))
Output
[
'0912adbd->12d1829-->218990d',
'ab2c8d-->82a921->193acd7'
]
You can construct the regex by steps. If I understand your requirements, you want a sequence of hexadecimal numbers (like a01d
or 11efeb23
, separated by arrows with one or two hyphens (->
or -->
).
The hex part’s regex is [0-9a-fA-F]+
(assuming it cannot be empty).
The arrow’s regex can be -{1,2}>
or (->|–>).
The arrow is only needed before each hex number but the first, so you’ll build the final regex in two parts: the first number, then the repetition of arrow and number.
So the general structure will be:
NUMBER(ARROW NUMBER)*
Which gives the following regex:
[0-9a-fA-F]+(-{1,2}>[0-9a-fA-F]+)*