Pandas Extract Number from String

Question

Given the following data frame:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
                   })
df

    A
0   1a
1   NaN
2   10a
3   100b
4   0b

I’d like to extract the numbers from each cell (where they exist).
The desired result is:

I know it can be done with str.extract, but I’m not sure how.

Asked By: Dance Party

||

Source

Answer 1

Give it a regex capture group:

df.A.str.extract('(d+)')

Gives you:

0      1
1    NaN
2     10
3    100
4      0
Name: A, dtype: object

Answered By: Jon Clements

Answer 2

To answer @Steven G ‘s question in the comment above, this should work:

df.A.str.extract('(^d*)')

Answered By: Taming

Answer 3

U can replace your column with your result using "assign" function:

df = df.assign(A = lambda x: x['A'].str.extract('(d+)'))

Answered By: Mehdi Golzadeh

Answer 4

If you have cases where you have multiple disjoint sets of digits, as in 1a2b3c, in which you would like to extract 123, you can do it with Series.str.replace:

>>> df
        A
0      1a
1      b2
2    a1b2
3  1a2b3c
>>> df['A'] = df['A'].str.replace('D+', '')
0      1
1      2
2     12
3    123

You could also work this around with Series.str.extractall and groupby but I think that this one is easier.

Hope this helps!

Answered By: Rostan

Pandas Extract Number from String

Question:

Answers: