Pandas Extract Number from String

Question:

Given the following data frame:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
                   })
df

    A
0   1a
1   NaN
2   10a
3   100b
4   0b

I’d like to extract the numbers from each cell (where they exist).
The desired result is:

    A
0   1
1   NaN
2   10
3   100
4   0

I know it can be done with str.extract, but I’m not sure how.

Asked By: Dance Party

||

Answers:

Give it a regex capture group:

df.A.str.extract('(d+)')

Gives you:

0      1
1    NaN
2     10
3    100
4      0
Name: A, dtype: object
Answered By: Jon Clements

To answer @Steven G ‘s question in the comment above, this should work:

df.A.str.extract('(^d*)')
Answered By: Taming

U can replace your column with your result using "assign" function:

df = df.assign(A = lambda x: x['A'].str.extract('(d+)'))
Answered By: Mehdi Golzadeh

If you have cases where you have multiple disjoint sets of digits, as in 1a2b3c, in which you would like to extract 123, you can do it with Series.str.replace:

>>> df
        A
0      1a
1      b2
2    a1b2
3  1a2b3c
>>> df['A'] = df['A'].str.replace('D+', '')
0      1
1      2
2     12
3    123

You could also work this around with Series.str.extractall and groupby but I think that this one is easier.

Hope this helps!

Answered By: Rostan
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.