Looping through python regex matches
Question:
I want to turn a string that looks like this:
ABC12DEF3G56HIJ7
into
12 * ABC
3 * DEF
56 * G
7 * HIJ
I want to construct the correct set of loops using regex matching. The crux of the issue is that the code has to be completely general because I cannot assume how long the [A-Z]
fragments will be, nor how long the [0-9]
fragments will be.
Answers:
Python’s re.findall
should work for you.
import re
s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')
for (letters, numbers) in re.findall(pattern, s):
print(numbers, '*', letters)
It is better to use re.finditer
if your dataset is large because that reduces memory consumption (findall()
return a list of all results, finditer()
finds them one by one).
import re
s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')
for m in re.finditer(pattern, s):
print m.group(2), '*', m.group(1)
Yet another option could be to use re.sub()
to create the desired strings from the captured groups:
import re
s = 'ABC12DEF3G56HIJ7'
for x in re.sub(r"([A-Z]+)(d+)", r'2 * 1,', s).rstrip(',').split(','):
print(x)
12 * ABC
3 * DEF
56 * G
7 * HIJ
I want to turn a string that looks like this:
ABC12DEF3G56HIJ7
into
12 * ABC
3 * DEF
56 * G
7 * HIJ
I want to construct the correct set of loops using regex matching. The crux of the issue is that the code has to be completely general because I cannot assume how long the [A-Z]
fragments will be, nor how long the [0-9]
fragments will be.
Python’s re.findall
should work for you.
import re
s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')
for (letters, numbers) in re.findall(pattern, s):
print(numbers, '*', letters)
It is better to use re.finditer
if your dataset is large because that reduces memory consumption (findall()
return a list of all results, finditer()
finds them one by one).
import re
s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')
for m in re.finditer(pattern, s):
print m.group(2), '*', m.group(1)
Yet another option could be to use re.sub()
to create the desired strings from the captured groups:
import re
s = 'ABC12DEF3G56HIJ7'
for x in re.sub(r"([A-Z]+)(d+)", r'2 * 1,', s).rstrip(',').split(','):
print(x)
12 * ABC
3 * DEF
56 * G
7 * HIJ