Change index column to real column in pandas
Question:
I had an original DataFrame that looked like this:
column0 column2
0 0 SomethingSomeData1MixedWithSomeData2
1 something SomeMoreDataWithSomeData3
2 whatever SomeData4
From that, I needed to extract the regex (SomeDatad)
, which let me to use df[column2].str.extractall()
, with the following results:
data
match
0 0 SomeData1
0 1 SomeData2
1 0 SomeData3
2 0 SomeData4
What I actually need is to have something like this:
data0 data1
0 SomeData1 SomeData2
1 SomeData3
2 SomeData4
It can also be something like:
data
0 [SomeData1, SomeData2]
1 [SomeData3]
2 [SomeData4]
I’ve tried to use df[column2].str.split()
, but it created a list on the column and kept all of the other stuff that wasn’t needed.
Answers:
You can use str.findall
instead of str.extractall
:
>>> df['column2'].str.findall('SomeDatad')
0 [SomeData1, SomeData2]
1 [SomeData3]
2 [SomeData4]
Name: column2, dtype: object
A quick but not optimized way is to use apply
on each row:
>>> df['column2'].str.findall('SomeDatad').apply(pd.Series).add_prefix('data')
data0 data1
0 SomeData1 SomeData2
1 SomeData3 NaN
2 SomeData4 NaN
You should just unstack
the output of extractall
:
(df['column2'].str.extractall('(SomeDatad)')[0]
.unstack('match', fill_value='')
.add_prefix('data')
.rename_axis(columns=None)
)
Output:
data0 data1
0 SomeData1 SomeData2
1 SomeData3
2 SomeData4
I had an original DataFrame that looked like this:
column0 column2
0 0 SomethingSomeData1MixedWithSomeData2
1 something SomeMoreDataWithSomeData3
2 whatever SomeData4
From that, I needed to extract the regex (SomeDatad)
, which let me to use df[column2].str.extractall()
, with the following results:
data
match
0 0 SomeData1
0 1 SomeData2
1 0 SomeData3
2 0 SomeData4
What I actually need is to have something like this:
data0 data1
0 SomeData1 SomeData2
1 SomeData3
2 SomeData4
It can also be something like:
data
0 [SomeData1, SomeData2]
1 [SomeData3]
2 [SomeData4]
I’ve tried to use df[column2].str.split()
, but it created a list on the column and kept all of the other stuff that wasn’t needed.
You can use str.findall
instead of str.extractall
:
>>> df['column2'].str.findall('SomeDatad')
0 [SomeData1, SomeData2]
1 [SomeData3]
2 [SomeData4]
Name: column2, dtype: object
A quick but not optimized way is to use apply
on each row:
>>> df['column2'].str.findall('SomeDatad').apply(pd.Series).add_prefix('data')
data0 data1
0 SomeData1 SomeData2
1 SomeData3 NaN
2 SomeData4 NaN
You should just unstack
the output of extractall
:
(df['column2'].str.extractall('(SomeDatad)')[0]
.unstack('match', fill_value='')
.add_prefix('data')
.rename_axis(columns=None)
)
Output:
data0 data1
0 SomeData1 SomeData2
1 SomeData3
2 SomeData4