Regular expression for complex numbers
Question:
So I’am trying to write regular expression for complex numbers (just as an exercise to study re module). But I can’t get it to work. I want regex to match strings of form: ’12+18j’, ‘-14+45j’, ’54’, ‘-87j’ and so on. My attempt:
import re
num = r'[+-]?(?:d*.d+|d+)'
complex_pattern = rf'(?:(?P<real>{num})|(?P<imag>{num}j))|(?:(?P=real)(?P=imag))'
complex_pattern = re.compile(complex_pattern)
But it doesn’t really work as I want.
m = complex_pattern.fullmatch('1+12j')
m.groupdict()
Out[166]: {'real': None, 'imag': '1+12j'}
The reason behind its structure is the fact that I want input string to contain either real or imaginary part or both. And also to be able to extract real and imag groups from match object. There is other approach i tried and it seems to work except it catches empty strings (”):
complex_pattern = rf'(?P<real>{num})+(?P<imag>{num}j)+'
complex_pattern = re.compile(complex_pattern)
I guess I could implement check for empty string simply using if. But I’m interested in more pure way and to know why first implementation doesn’t work as expected.
Answers:
Does this work for what you want?
import re
words= '+122+6766j'
pattern = re.compile(r'((^[-+]?(?P<real>d+))?[-+]?(?P<img>d{2,}j?w)?)')
pattern.fullmatch(words).groupdict()
Output
{'real': '122', 'img': '6766j'}
I suggest using
import re
pattern = r'^(?!$)(?P<real>(?P<sign1>[+-]?)(?P<number1>d+(?:.d+)?))?(?:(?P<imag>(?P<sign2>[+-]?)(?P<number2>d+(?:.d+)?j)))?$'
texts = ['1+12j', '12+18j','-14+45j','54','-87j']
for text in texts:
match = re.fullmatch(pattern, text)
if match:
print(text, '=>', match.groupdict())
else:
print(f'{text} did not match!')
See the Python demo. Output:
1+12j => {'real': '1', 'sign1': '', 'number1': '1', 'imag': '+12j', 'sign2': '+', 'number2': '12j'}
12+18j => {'real': '12', 'sign1': '', 'number1': '12', 'imag': '+18j', 'sign2': '+', 'number2': '18j'}
-14+45j => {'real': '-14', 'sign1': '-', 'number1': '14', 'imag': '+45j', 'sign2': '+', 'number2': '45j'}
54 => {'real': '54', 'sign1': '', 'number1': '54', 'imag': None, 'sign2': None, 'number2': None}
-87j => {'real': '-8', 'sign1': '-', 'number1': '8', 'imag': '7j', 'sign2': '', 'number2': '7j'}
See the regex demo.
Details
^
– start of string
(?!$)
– no end of string should follow at this position (no empty input is allowed)
(?P<real>(?P<sign1>[+-]?)(?P<number1>d+(?:.d+)?))?
– a "real" group:
(?P<sign1>[+-]?)
– an optional -
or +
sign captured into Group "sign1"
(?P<number1>d+(?:.d+)?)
– one or more digits followed with an optional sequence of a .
and one or more digits captured into Group "number1"
(?P<imag>(?P<sign2>[+-]?)(?P<number2>d+(?:.d+)?j))?
– an optional sequence captured into "imag" group:
(?P<sign2>[+-]?)
– an optional -
or +
sign captured into Group "sign2"
(?P<number2>d+(?:.d+)?j)
– one or more digits followed with an optional sequence of a .
and one or more digits and then a j
char captured into Group "number2"
$
– end of string.
Even though I accepted Wiktor Stribiżew’s answer and consider it really good. I have to add something that I noticed. Firstly, last string in texts
list didn’t grouped correctly (i.e. ‘-87j’ -> real: -8; imag: 7j). To address this I propose following changes to simplified version of his answer:
import re
num = r'[+-]?(?:d*.d+|d+)'
pattern = rf'(?!$)(?P<real>{num}(?!d))?(?P<imag>{num}j)?'
texts = ['1+12j', '12+18j','-14+45j','54','-87j']
for text in texts:
match = re.fullmatch(pattern, text)
if match:
print(f'{text:>7} => {match.groupdict()}')
else:
print(f'{text:>7} did not match!')
Output:
1+12j => {'real': '1', 'imag': '+12j'}
12+18j => {'real': '12', 'imag': '+18j'}
-14+45j => {'real': '-14', 'imag': '+45j'}
54 => {'real': '54', 'imag': None}
-87j => {'real': None, 'imag': '-87j'}
The important diffrence here is adding (?!d)
to ‘real’ group of regex, to prevent strings like ‘-87j’ to be splitted into ‘-8’ and ‘7j’.
Just for completeness, I wanted to add a solution which also allows basic scientific notation, and also use of i or j. I answer this only if other people came here like me to seek a regular expression which can find complex numbers, and for this key fact, a number with no imaginary part does not return as a match.
It deviates from the original question because of matching groups but could be changed, see commented out line with cx_num_groups
.
This expression does not include matching groups for real and imaginary part because it allows for numbers such as 2j.
def _complex_re_gen():
'''
Because it is complicated, returns a string which returns a match with complex numbers.
'''
num = r'(?:[+-]?(?:d*.)?d+)'
num_sci = r'(?:{num}(?:e[+-]?d+)?)'.format(num=num)
cx_num = r'(?:{num_sci}?{num_sci}[ij])'.format(num_sci=num_sci)
#cx_num_groups = cx_num = r'(?:(P<real>{num_sci})?(P<img>{num_sci}[ij])?)'.format(num_sci=num_sci)
cx_match_wrapped= r"^(?:{cx_num}|({cx_num}))$".format(cx_num=cx_num)
return cx_match_wrapped
With following test strings, this regexp returns a match for the commented ones:
cmplx_tests = [
'1 + 2j' , #no match
'1e5-2e-2j' , #match
'i2 +4j' , #no match
'1.25' , #no match
'-5-3.2i' , #match
'64.2-3.9j' , #no match
]
This post was written in part because I wanted to solve a problem in this post, with parsing complex arrays inside of parameter files.
So I’am trying to write regular expression for complex numbers (just as an exercise to study re module). But I can’t get it to work. I want regex to match strings of form: ’12+18j’, ‘-14+45j’, ’54’, ‘-87j’ and so on. My attempt:
import re
num = r'[+-]?(?:d*.d+|d+)'
complex_pattern = rf'(?:(?P<real>{num})|(?P<imag>{num}j))|(?:(?P=real)(?P=imag))'
complex_pattern = re.compile(complex_pattern)
But it doesn’t really work as I want.
m = complex_pattern.fullmatch('1+12j')
m.groupdict()
Out[166]: {'real': None, 'imag': '1+12j'}
The reason behind its structure is the fact that I want input string to contain either real or imaginary part or both. And also to be able to extract real and imag groups from match object. There is other approach i tried and it seems to work except it catches empty strings (”):
complex_pattern = rf'(?P<real>{num})+(?P<imag>{num}j)+'
complex_pattern = re.compile(complex_pattern)
I guess I could implement check for empty string simply using if. But I’m interested in more pure way and to know why first implementation doesn’t work as expected.
Does this work for what you want?
import re
words= '+122+6766j'
pattern = re.compile(r'((^[-+]?(?P<real>d+))?[-+]?(?P<img>d{2,}j?w)?)')
pattern.fullmatch(words).groupdict()
Output
{'real': '122', 'img': '6766j'}
I suggest using
import re
pattern = r'^(?!$)(?P<real>(?P<sign1>[+-]?)(?P<number1>d+(?:.d+)?))?(?:(?P<imag>(?P<sign2>[+-]?)(?P<number2>d+(?:.d+)?j)))?$'
texts = ['1+12j', '12+18j','-14+45j','54','-87j']
for text in texts:
match = re.fullmatch(pattern, text)
if match:
print(text, '=>', match.groupdict())
else:
print(f'{text} did not match!')
See the Python demo. Output:
1+12j => {'real': '1', 'sign1': '', 'number1': '1', 'imag': '+12j', 'sign2': '+', 'number2': '12j'}
12+18j => {'real': '12', 'sign1': '', 'number1': '12', 'imag': '+18j', 'sign2': '+', 'number2': '18j'}
-14+45j => {'real': '-14', 'sign1': '-', 'number1': '14', 'imag': '+45j', 'sign2': '+', 'number2': '45j'}
54 => {'real': '54', 'sign1': '', 'number1': '54', 'imag': None, 'sign2': None, 'number2': None}
-87j => {'real': '-8', 'sign1': '-', 'number1': '8', 'imag': '7j', 'sign2': '', 'number2': '7j'}
See the regex demo.
Details
^
– start of string(?!$)
– no end of string should follow at this position (no empty input is allowed)(?P<real>(?P<sign1>[+-]?)(?P<number1>d+(?:.d+)?))?
– a "real" group:(?P<sign1>[+-]?)
– an optional-
or+
sign captured into Group "sign1"(?P<number1>d+(?:.d+)?)
– one or more digits followed with an optional sequence of a.
and one or more digits captured into Group "number1"
(?P<imag>(?P<sign2>[+-]?)(?P<number2>d+(?:.d+)?j))?
– an optional sequence captured into "imag" group:(?P<sign2>[+-]?)
– an optional-
or+
sign captured into Group "sign2"(?P<number2>d+(?:.d+)?j)
– one or more digits followed with an optional sequence of a.
and one or more digits and then aj
char captured into Group "number2"
$
– end of string.
Even though I accepted Wiktor Stribiżew’s answer and consider it really good. I have to add something that I noticed. Firstly, last string in texts
list didn’t grouped correctly (i.e. ‘-87j’ -> real: -8; imag: 7j). To address this I propose following changes to simplified version of his answer:
import re
num = r'[+-]?(?:d*.d+|d+)'
pattern = rf'(?!$)(?P<real>{num}(?!d))?(?P<imag>{num}j)?'
texts = ['1+12j', '12+18j','-14+45j','54','-87j']
for text in texts:
match = re.fullmatch(pattern, text)
if match:
print(f'{text:>7} => {match.groupdict()}')
else:
print(f'{text:>7} did not match!')
Output:
1+12j => {'real': '1', 'imag': '+12j'}
12+18j => {'real': '12', 'imag': '+18j'}
-14+45j => {'real': '-14', 'imag': '+45j'}
54 => {'real': '54', 'imag': None}
-87j => {'real': None, 'imag': '-87j'}
The important diffrence here is adding (?!d)
to ‘real’ group of regex, to prevent strings like ‘-87j’ to be splitted into ‘-8’ and ‘7j’.
Just for completeness, I wanted to add a solution which also allows basic scientific notation, and also use of i or j. I answer this only if other people came here like me to seek a regular expression which can find complex numbers, and for this key fact, a number with no imaginary part does not return as a match.
It deviates from the original question because of matching groups but could be changed, see commented out line with cx_num_groups
.
This expression does not include matching groups for real and imaginary part because it allows for numbers such as 2j.
def _complex_re_gen():
'''
Because it is complicated, returns a string which returns a match with complex numbers.
'''
num = r'(?:[+-]?(?:d*.)?d+)'
num_sci = r'(?:{num}(?:e[+-]?d+)?)'.format(num=num)
cx_num = r'(?:{num_sci}?{num_sci}[ij])'.format(num_sci=num_sci)
#cx_num_groups = cx_num = r'(?:(P<real>{num_sci})?(P<img>{num_sci}[ij])?)'.format(num_sci=num_sci)
cx_match_wrapped= r"^(?:{cx_num}|({cx_num}))$".format(cx_num=cx_num)
return cx_match_wrapped
With following test strings, this regexp returns a match for the commented ones:
cmplx_tests = [
'1 + 2j' , #no match
'1e5-2e-2j' , #match
'i2 +4j' , #no match
'1.25' , #no match
'-5-3.2i' , #match
'64.2-3.9j' , #no match
]
This post was written in part because I wanted to solve a problem in this post, with parsing complex arrays inside of parameter files.