How to extract span in re.finditer method in python?
Question:
the results of re.finditer is as below.
[i for i in result]
=[<re.Match object; span=(0, 10), match='sin theta '>,
<re.Match object; span=(12, 18), match='cos x '>,
<re.Match object; span=(20, 26), match='e ^ x '>,
<re.Match object; span=(26, 32), match='f( x )'>,
<re.Match object; span=(37, 45), match='log_ {x}'>]
Here, I used the code i.span instead of i, but I just got something as below.
[<function Match.span(group=0, /)>,
<function Match.span(group=0, /)>,
<function Match.span(group=0, /)>,
<function Match.span(group=0, /)>,
<function Match.span(group=0, /)>]
I’m gonna extract span in re.finditer.
like (0,10), (12,18), …
Help me please!
I defined the function for getting re.finditer
The code is as below.
import re
def convert_ftn_to_token(seq):
va = '[a-z]{1,}'
ftn_lst = ['sin','cos','tan','log_', 'e ?^']
ftn_lst = [ftn + ' ?{? ?' + va +' ?}?' for ftn in ftn_lst]
ftn_lst2 = [chr(i) for i in range(65,91)] + [chr(i) for i in range(97,123)]
ftn_lst2 = [ftn + ' ?( ?' + va + ' ?)' for ftn in ftn_lst2]
ftn_c = re.compile(
'|'.join(ftn_lst2) +'|'+
'|'.join(ftn_lst)
)
return re.finditer(ftn_c,seq)
i.span for i in results
Answers:
You can use start()
and end()
in regex’s Match
object, documentation about it here. They correspond to the lower and upper bound of span
respectively. As for the grouping stated in the docs, that only applies if you are intending to use the grouping functionality of Match
. If you intend to get the span of the entire match, you can simply do match.start()
and match.end()
, where match
is the match object returned by the regex.
Another option is using span()
of the same Match
object. Note this is different from just span
which will give you the memory address rather than actually call the function. Doing match.span()
will give you a tuple of the start and end. Taking your first match object as an example this would return (0,10)
.span
is a method, not an attribute. You want .span()
which will give the start, end tuple.
the results of re.finditer is as below.
[i for i in result]
=[<re.Match object; span=(0, 10), match='sin theta '>,
<re.Match object; span=(12, 18), match='cos x '>,
<re.Match object; span=(20, 26), match='e ^ x '>,
<re.Match object; span=(26, 32), match='f( x )'>,
<re.Match object; span=(37, 45), match='log_ {x}'>]
Here, I used the code i.span instead of i, but I just got something as below.
[<function Match.span(group=0, /)>,
<function Match.span(group=0, /)>,
<function Match.span(group=0, /)>,
<function Match.span(group=0, /)>,
<function Match.span(group=0, /)>]
I’m gonna extract span in re.finditer.
like (0,10), (12,18), …
Help me please!
I defined the function for getting re.finditer
The code is as below.
import re
def convert_ftn_to_token(seq):
va = '[a-z]{1,}'
ftn_lst = ['sin','cos','tan','log_', 'e ?^']
ftn_lst = [ftn + ' ?{? ?' + va +' ?}?' for ftn in ftn_lst]
ftn_lst2 = [chr(i) for i in range(65,91)] + [chr(i) for i in range(97,123)]
ftn_lst2 = [ftn + ' ?( ?' + va + ' ?)' for ftn in ftn_lst2]
ftn_c = re.compile(
'|'.join(ftn_lst2) +'|'+
'|'.join(ftn_lst)
)
return re.finditer(ftn_c,seq)
i.span for i in results
You can use start()
and end()
in regex’s Match
object, documentation about it here. They correspond to the lower and upper bound of span
respectively. As for the grouping stated in the docs, that only applies if you are intending to use the grouping functionality of Match
. If you intend to get the span of the entire match, you can simply do match.start()
and match.end()
, where match
is the match object returned by the regex.
Another option is using span()
of the same Match
object. Note this is different from just span
which will give you the memory address rather than actually call the function. Doing match.span()
will give you a tuple of the start and end. Taking your first match object as an example this would return (0,10)
.span
is a method, not an attribute. You want .span()
which will give the start, end tuple.