How to create regex to match a string that contains only hexadecimal numbers and arrows?

Question:

I am using a string that uses the following characters:

0-9    
a-f    
A-F    
-
>

The mixture of the greater than and hyphen must be:

->
-->

Here is the regex that I have so far:

[0-9a-fA-F->]+

I tried these others using exclusion with ^ but they didn’t work:

[^g-zG-Z][0-9a-fA-F->]+
^g-zG-Z[0-9a-fA-F->]+
[0-9a-fA-F->]^g-zG-Z+
[0-9a-fA-F->]+^g-zG-Z
[0-9a-fA-F->]+[^g-zG-Z]

Here are some samples:

"0912adbd->12d1829-->218990d"
"ab2c8d-->82a921->193acd7"
Asked By: user19926715

||

Answers:

Firstly, you don’t need to escape - and >

Here’s the regex that worked for me:
^([0-9a-fA-F]*(->)*(-->)*)*$

Here’s an alternative regex:
^([0-9a-fA-F]*(-+>)*)*$

What does the regex do?

  • ^ matches the beginning of the string and $ matches the ending.
  • * matches 0 or more instances of the preceding token
  • Created a big () capturing group to match any token.
  • [0-9a-fA-F] matches any character that is in the range.
  • (->) and (-->) match only those given instances.

Putting it into a code:

import re
regex = "^([0-9a-fA-F]*(->)*(-->)*)*$"
re.match(re.compile(regex),"0912adbd->12d1829-->218990d")
re.match(re.compile(regex),"ab2c8d-->82a921->193acd7")
re.match(re.compile(regex),"this-failed->so-->bad")

You can also convert it into a boolean:

print(bool(re.match(re.compile(regex),"0912adbd->12d1829-->218990d")))
print(bool(re.match(re.compile(regex),"ab2c8d-->82a921->193acd7")))
print(bool(re.match(re.compile(regex),"this-failed->so-->bad")))

Output:

True
True
False

I recommend using regexr.com to check your regex.

Answered By: The Myth

It’s probably more intuitive to re.split() on the "arrows" and then check that all the resulting strings are purely hexadecimal:

import re

def check_hex_arrows(inp):
    parts = re.split(r"-{1,2}>", inp)
    return all(re.match(r"^[0-9a-fA-F]*$", p) for p in parts)

Testing:

test_strings = ["0912adbd->12d1829-->218990d",
                "ab2c8d-->82a921->193acd7",
                "0912adbd",
                "0912adbd-12345",
                "abcfdefghi",
                "abcfdef----->123",
                "abcfdefghi----->123" ]

for t in test_strings:
    print(t, check_hex_arrows(t))

gives:

0912adbd->12d1829-->218990d True
ab2c8d-->82a921->193acd7 True
0912adbd True
0912adbd-12345 False
abcfdefghi False
abcfdef----->123 False
abcfdefghi----->123 False
Answered By: Pranav Hosangadi

If there must be an arrow present, and not at the start or end of the string using a case insensitive pattern:

^[a-fd]+(?:-{1,2}>[a-fd]+)+$

Explanation

  • ^ Start of string
  • [a-fd]+ Match 1+ chars a-f or digits
  • (?: Non capture group to repeat as a whole
    • -{1,2}>[a-fd]+ Match - or -- and > followed by 1+ chars a-f or digits
  • )+ Close the non capture group and repeat 1+ times
  • $ End of string

See a regex demo and a Python demo.

import re

pattern = r"^[a-fd]+(?:-{1,2}>[a-fd]+)+$"
s = ("0912adbd->12d1829-->218990dn"
            "ab2c8d-->82a921->193acd7n"
            "test")
print(re.findall(pattern, s, re.I | re.M))

Output

[
  '0912adbd->12d1829-->218990d',
  'ab2c8d-->82a921->193acd7'
]
Answered By: The fourth bird

You can construct the regex by steps. If I understand your requirements, you want a sequence of hexadecimal numbers (like a01d or 11efeb23, separated by arrows with one or two hyphens (-> or -->).

The hex part’s regex is [0-9a-fA-F]+ (assuming it cannot be empty).

The arrow’s regex can be -{1,2}> or (->|–>).

The arrow is only needed before each hex number but the first, so you’ll build the final regex in two parts: the first number, then the repetition of arrow and number.

So the general structure will be:

NUMBER(ARROW NUMBER)*

Which gives the following regex:

[0-9a-fA-F]+(-{1,2}>[0-9a-fA-F]+)*
Answered By: Nowhere man
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.