Creating New Column In Pandas Dataframe Using Regex

Question:

I have a column in a pandas df of type object that I want to parse to get the first number in the string, and create a new column containing that number as an int.

For example:

Existing df

    col
    'foo 12 bar 8'
    'bar 3 foo'
    'bar 32bar 98'

Desired df

    col               col1
    'foo 12 bar 8'    12
    'bar 3 foo'       3
    'bar 32bar 98'    32

I have code that works on any individual cell in the column series

int(re.search(r'd+', df.iloc[0]['col']).group())

The above code works fine and returns 12 as it should. But when I try to create a new column using the whole series:

df['col1'] = int(re.search(r'd+', df['col']).group())

I get the following Error:

TypeError: expected string or bytes-like object

I tried wrapping a str() around df['col'] which got rid of the error but yielded all 0’s in col1

I’ve also tried converting col to a list of strings and iterating through the list, which only yields the same error. Does anyone know what I’m doing wrong? Help would be much appreciated.

Asked By: Cam8593

||

Answers:

This will do the trick:

new_column = []    
for values in df['col']:
    new_column.append(re.search(r'd+', values).group())

df['col1'] = new_column

the output looks like this:

            col    col1
0  foo 12 bar 8      12
1     bar 3 foo       3
2  bar 32bar 98      32
Answered By: Albo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.