Regex python do lookahead in a conditional statement

Question:

I’m trying to do lookaheads in a conditional statement.
Explanation by words:

(specified string that has to be a number (decimal or not) or a word character, a named capturing group is created) (if the named capturing group is a word character then check if the next string is a number (decimal or not) with a lookahead else check if the next string is a word character with a lookahead)

To understand, here some examples that are matched or not:

a 6 or 6.4 b-> matched, since the first and the second string haven’t the same "type"

ab 7 or 7 rt -> not matched, need only a single word character

R 7.55t -> not matched, 7.55t is not a valid number

a r or 5 6-> not matched, the first and the second string have the same "type" (number and number, or, word character and word character)

I’ve already found the answer for the first string: (?P<var>([a-zA-Z]|(-?d+(.d+)?)))

I’ve found nothing on Internet about lookaheads in a condition statement in Python.

The problem is that Python doesn’t support conditional statement like PCRE:

Python supports conditionals using a numbered or named capturing group. Python does not support conditionals using lookaround, even though Python does support lookaround outside conditionals. Instead of a conditional like (?(?=regex)then|else), you can alternate two opposite lookarounds: (?=regex)then|(?!regex)else. (source: https://www.regular-expressions.info/conditional.html)

Maybe there’s a better solution that I’ve planned or maybe it’s just impossible to do what I want, I don’t know.

What I tried: (?P<var>([a-zA-Z]|(-?d+(.d+)?))) (?(?=[a-zA-Z])(?=(-?d+(.d+)?))|(?=[a-zA-Z]))(?P=var) but that doesn’t work.

Asked By: NoaLeGeek68

||

Answers:

The named capture group (?P<var>...) contains the actual text which matched, not the regex itself. There is a way to create a named regex, too; but it’s probably not particularly necessary or useful here.

Simply spell out the alternatives:

((?<![a-zA-Z0-9])[a-zA-Z]s+-?d+(.d+)?(?![a-zA-Z.0-9])|(?<![a-zA-Z.0-9])-?d+(.d+)?s+[a-zA-Z](?![a-zA-Z0-9]))

If you genuinely require the second token to remain unmatched, it should be obvious how to change the parts starting at each s into a lookahead.

Demo: https://ideone.com/nPNAIN

Answered By: tripleee
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.