Splitting a string where it switches between numeric and alphabetic characters

Question:

I am parsing some data where the standard format is something like 10 pizzas. Sometimes, data is input correctly and we might end up with 5pizzas instead of 5 pizzas. In this scenario, I want to parse out the number of pizzas.

The naïve way of doing this would be to check character by character, building up a string until we reach a non-digit and then casting that string as an integer.

num_pizzas = ""
for character in data_input:
   if character.isdigit():
      num_pizzas += character
   else:
      break
num_pizzas = int(num_pizzas)

This is pretty clunky, though. Is there an easier way to split a string where it switches from numeric digits to alphabetic characters?

Asked By: Chris

||

Answers:

How about a regex ?

reg = re.compile(r'(?P<numbers>d*)(?P<rest>.*)')
result = reg.search(str)
if result:
    numbers = result.group('numbers')
    rest = result.group('rest')
Answered By: cnicutar

To split the string at digits you can use re.split with the regular expression d+:

>>> import re
>>> def my_split(s):
    return filter(None, re.split(r'(d+)', s))

>>> my_split('5pizzas')
['5', 'pizzas']
>>> my_split('foo123bar')
['foo', '123', 'bar']

To find the first number use re.search:

>>> re.search('d+', '5pizzas').group()
'5'
>>> re.search('d+', 'foo123bar').group()
'123'

If you know the number must be at the start of the string then you can use re.match instead of re.search. If you want to find all the numbers and discard the rest you can use re.findall.

Answered By: Mark Byers

You ask for a way to split a string on digits, but then in your example, what you actually want is just the first numbers, this done easily with itertools.takewhile():

>>> int("".join(itertools.takewhile(str.isdigit, "10pizzas")))
10

This makes a lot of sense – what we are doing is taking the character from the string while they are digits. This has the advantage of stopping processing as soon as we get to the first non-digit character.

If you need the later data too, then what you are looking for is itertools.groupby() mixed in with a simple list comprehension:

>>> ["".join(x) for _, x in itertools.groupby("dfsd98sd8f68as7df56", key=str.isdigit)]
['dfsd', '98', 'sd', '8', 'f', '68', 'as', '7', 'df', '56']

If you then want to make one giant number:

>>> int("".join("".join(x) for is_number, x in itertools.groupby("dfsd98sd8f68as7df56", key=str.isdigit) if is_number is True))
98868756
Answered By: Gareth Latty

Answer added as possible way to solve How to split a string into a list by digits? which was dupe-linked to this question.

You can do the splitting yourself:

  • use a temporary list to accumulate characters that are not digits
  • if you find a digit, add the temporary list (''.join()-ed) to the result list (only if not empty) and do not forget to clear the temporary list
  • repeat until all characters are processed and if the temp-lists still has content, add it

text = "Ka12Tu12La"

splitted = []   # our result
tmp = []        # our temporary character collector

for c in text:
    if not c.isdigit():
        tmp.append(c)    # not a digit, add it

    elif tmp:            # c is a digit, if tmp filled, add it  
        splitted.append(''.join(tmp))
        tmp = []

if tmp:
    splitted.append(''.join(tmp))

print(splitted)

Output:

['Ka', 'Tu', 'La']

References:

Answered By: Patrick Artner

More clearer version of cnicutar’s answer

import re
 
str_to_split = "test123"
 
temp = re.compile("([a-zA-Z]+)([0-9]+)")
res = temp.match(str_to_split).groups()
 
print("The tuple after the split of string and number : " + str(res))

Answered By: JackTheKnife
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.