Separate number from unit in a string in Python

Question

I have strings containing numbers with their units, e.g. 2GB, 17ft, etc.
I would like to separate the number from the unit and create 2 different strings. Sometimes, there is a whitespace between them (e.g. 2 GB) and it’s easy to do it using split(‘ ‘).

When they are together (e.g. 2GB), I would test every character until I find a letter, instead of a number.

s='17GB'
number=''
unit=''
for c in s:
    if c.isdigit():
        number+=c
    else:
        unit+=c

Is there a better way to do it?

Thanks

Asked By: duduklein

||

Source

Answer 1

You could use a regular expression to divide the string into groups:

>>> import re
>>> p = re.compile('(d+)s*(w+)')
>>> p.match('2GB').groups()
('2', 'GB')
>>> p.match('17 ft').groups()
('17', 'ft')

Answered By: Jarret Hardie

Answer 2

How about using a regular expression

http://python.org/doc/1.6/lib/module-regsub.html

Answered By: Ole Media

Answer 3

You should use regular expressions, grouping together what you want to find out:

import re
s = "17GB"
match = re.match(r"^([1-9][0-9]*)s*(GB|MB|KB|B)$", s)
if match:
  print "Number: %d, unit: %s" % (int(match.group(1)), match.group(2))

Change the regex according to what you want to parse. If you’re unfamiliar with regular expressions, here’s a great tutorial site.

Answered By: AndiDog

Answer 4

tokenize can help:

>>> import StringIO
>>> s = StringIO.StringIO('27GB')
>>> for token in tokenize.generate_tokens(s.readline):
...   print token
... 
(2, '27', (1, 0), (1, 2), '27GB')
(1, 'GB', (1, 2), (1, 4), '27GB')
(0, '', (2, 0), (2, 0), '')

Answered By: Ignacio Vazquez-Abrams

Answer 5

For this task, I would definitely use a regular expression:

import re
there = re.compile(r's*(d+)s*(S+)')
thematch = there.match(s)
if thematch:
  number, unit = thematch.groups()
else:
  raise ValueError('String %r not in the expected format' % s)

In the RE pattern, s means “whitespace”, d means “digit”, S means non-whitespace; * means “0 or more of the preceding”, + means “1 or more of the preceding, and the parentheses enclose “capturing groups” which are then returned by the groups() call on the match-object. (thematch is None if the given string doesn’t correspond to the pattern: optional whitespace, then one or more digits, then optional whitespace, then one or more non-whitespace characters).

Answered By: Alex Martelli

Answer 6

A regular expression.

import re

m = re.match(r's*(?P<n>[-+]?[.0-9])s*(?P<u>.*)', s)
if m is None:
  raise ValueError("not a number with units")
number = m.group("n")
unit = m.group("u")

This will give you a number (integer or fixed point; too hard to disambiguate scientific notation’s “e” from a unit prefix) with an optional sign, followed by the units, with optional whitespace.

You can use re.compile() if you’re going to be doing a lot of matches.

Answered By: Mike DeSimone

Answer 7

s='17GB'
for i,c in enumerate(s):
    if not c.isdigit():
        break
number=int(s[:i])
unit=s[i:]

Answered By: John La Rooy

Answer 8

You can break out of the loop when you find the first non-digit character

for i,c in enumerate(s):
    if not c.isdigit():
        break
number = s[:i]
unit = s[i:].lstrip()

If you have negative and decimals:

numeric = '0123456789-.'
for i,c in enumerate(s):
    if c not in numeric:
        break
number = s[:i]
unit = s[i:].lstrip()

Answered By: pwdyson

Answer 9

>>> s="17GB"
>>> ind=map(str.isalpha,s).index(True)
>>> num,suffix=s[:ind],s[ind:]
>>> print num+":"+suffix
17:GB

Answered By: ghostdog74

Answer 10

This uses an approach which should be a bit more forgiving than regexes. Note: this is not as performant as the other solutions posted.

def split_units(value):
    """
    >>> split_units("2GB")
    (2.0, 'GB')
    >>> split_units("17 ft")
    (17.0, 'ft')
    >>> split_units("   3.4e-27 frobnitzem ")
    (3.4e-27, 'frobnitzem')
    >>> split_units("9001")
    (9001.0, '')
    >>> split_units("spam sandwhiches")
    (0, 'spam sandwhiches')
    >>> split_units("")
    (0, '')
    """
    units = ""
    number = 0
    while value:
        try:
            number = float(value)
            break
        except ValueError:
            units = value[-1:] + units
            value = value[:-1]
    return number, units.strip()

Answered By: Logan Evans

Answer 11

SCIENTIFIC NOTATION
This regex is working well for me to parse numbers that may be in scientific notation, and is based on the recent python documentation about scanf:
https://docs.python.org/3/library/re.html#simulating-scanf

units_pattern = re.compile("([-+]?(d+(.d*)?|.d+)([eE][-+]?d+)?|s*[a-zA-Z]+s*$)")
number_with_units = list(match.group(0) for match in units_pattern.finditer("+2.0e-1 mm"))
print(number_with_units)
>>>['+2.0e-1', ' mm']

n, u = number_with_units
print(float(n), u.strip())
>>>0.2 mm

Answered By: Vince W.

Answer 12

try the regex pattern below. the first group (the scanf() tokens for a number any which way) is lifted directly from the python docs for the re module.

import re
SCANF_MEASUREMENT = re.compile(
    r'''(                      # group match like scanf() token %e, %E, %f, %g
    [-+]?                      # +/- or nothing for positive
    (d+(.d*)?|.d+)        # match numbers: 1, 1., 1.1, .1
    ([eE][-+]?d+)?            # scientific notation: e(+/-)2 (*10^2)
    )
    (s*)                      # separator: white space or nothing
    (                          # unit of measure: like GB. also works for no units
    S*)''',    re.VERBOSE)
'''
:var SCANF_MEASUREMENT:
    regular expression object that will match a measurement

    **measurement** is the value of a quantity of something. most complicated example::

        -666.6e-100 units
'''

def parse_measurement(value_sep_units):
    measurement = re.match(SCANF_MEASUREMENT, value_sep_units)
    try:
        value = float(measurement[0])
    except ValueError:
        print 'doesn't start with a number', value_sep_units
    units = measurement[5]

    return value, units

Answered By: steodatus

Answer 13

This kind of parser is already integrated into Pint:

Pint is a Python package to define, operate and manipulate physical
quantities: the product of a numerical value and a unit of
measurement. It allows arithmetic operations between them and
conversions from and to different units.

You can install it with pip install pint.

Then, you can parse a string, get the desired value (‘magnitude’) and its unit:

>>> from pint import UnitRegistry
>>> ureg = UnitRegistry()
>>> size = ureg('2GB')
>>> size.m
2
>>> size.u
<Unit('gigabyte')>
>>> size.to('GiB')
<Quantity(1.86264515, 'gibibyte')>
>>> length = ureg('17ft')
>>> length.m
17
>>> length.u
<Unit('foot')>
>>> length.to('cm')
<Quantity(518.16, 'centimeter')>

Answered By: Eric Duminil

Answer 14

Unfortunately, none of the previous codes worked correctly in my situation. I developed the following code. The idea behind the code is that every number ends with a digit or dot.

def splitValUnit(s):

    s = s.replace(' ', '')
    lastIndex = len(s) - 1
    i = lastIndex
    for i in range(lastIndex, -1, -1):
        if (s[i].isdigit() or s[i] == '.'):
            break
        
    i = i + 1

    value = 0
    unit = ''
    try:
        value = float(s[:i])
        unit = s[i:]
    except:
        pass

    return {'value': value, 'unit': unit}

print(splitValUnit('7'))             #{'value': 7.0, 'unit': ''}
print(splitValUnit('+7'))            #{'value': 7.0, 'unit': ''}
print(splitValUnit('7m'))            #{'value': 7.0, 'unit': 'm'}
print(splitValUnit('27'))            #{'value': 27.0, 'unit': ''}
print(splitValUnit('7.'))            #{'value': 7.0, 'unit': ''}
print(splitValUnit('2GHz'))          #{'value': 2.0, 'unit': 'GHz'}
print(splitValUnit('+2.e-10H'))      #{'value': 2e-10, 'unit': 'H'}
print(splitValUnit('2.3e+4 MegaOhm'))#{'value': 23000.0, 'unit': 'MegaOhm'}
print(splitValUnit('-4.'))           #{'value': -4.0, 'unit': ''}
print(splitValUnit('e mm'))          #{'value': 0, 'unit': ''}
print(splitValUnit(''))              #{'value': 0, 'unit': ''}

Answered By: Farhad

Separate number from unit in a string in Python

Question:

Answers: