Change index column to real column in pandas

Question:

I had an original DataFrame that looked like this:

    column0                              column2
0         0 SomethingSomeData1MixedWithSomeData2
1 something            SomeMoreDataWithSomeData3
2  whatever                            SomeData4

From that, I needed to extract the regex (SomeDatad), which let me to use df[column2].str.extractall(), with the following results:

             data
  match
0     0 SomeData1
0     1 SomeData2
1     0 SomeData3
2     0 SomeData4

What I actually need is to have something like this:

      data0     data1
0 SomeData1 SomeData2
1 SomeData3
2 SomeData4

It can also be something like:

                    data
0 [SomeData1, SomeData2]
1            [SomeData3]
2            [SomeData4]

I’ve tried to use df[column2].str.split(), but it created a list on the column and kept all of the other stuff that wasn’t needed.

Asked By: Jose Vega

||

Answers:

You can use str.findall instead of str.extractall:

>>> df['column2'].str.findall('SomeDatad')
0    [SomeData1, SomeData2]
1               [SomeData3]
2               [SomeData4]
Name: column2, dtype: object

A quick but not optimized way is to use apply on each row:

>>> df['column2'].str.findall('SomeDatad').apply(pd.Series).add_prefix('data')
       data0      data1
0  SomeData1  SomeData2
1  SomeData3        NaN
2  SomeData4        NaN
Answered By: Corralien

You should just unstack the output of extractall:

(df['column2'].str.extractall('(SomeDatad)')[0]
 .unstack('match', fill_value='')
 .add_prefix('data')
 .rename_axis(columns=None)
)

Output:

       data0      data1
0  SomeData1  SomeData2
1  SomeData3           
2  SomeData4           
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.