Pandas transform list values and their column names
Question:
I have a pandas dataframe with 1 row and values in columns by separated by categories
car > audi > a4
car > bmw > 3er
moto > bmw > gs
[item1, item2, item3]
[item1, item4, item5]
[item6]
and I would like to create structure something like this:
item
category 1
category 2
category 3
item 1
car
audi
a4
item 1
car
bmw
3er
item 2
car
audi
a4
item 3
car
audi
a4
item 4
car
bmw
3er
item 5
car
bmw
3er
item 6
moto
bmw
gs
What is the best solution?
Answers:
You can use the explode
function that is a pandas built-in.
You can use:
(df.set_axis(df.columns.str.split('s*>s*', expand=True), axis=1)
.loc[0].explode()
.reset_index(name='item')
.rename(columns=lambda x: x.replace('level_', 'category'))
)
Output:
category0 category1 category2 item
0 car audi a4 item1
1 car audi a4 item2
2 car audi a4 item3
3 car bmw 3er item1
4 car bmw 3er item4
5 car bmw 3er item5
6 moto bmw gs item6
Used input:
df = pd.DataFrame({'car > audi > a4': [['item1', 'item2', 'item3']],
'car > bmw > 3er': [['item1', 'item4', 'item5']],
'moto > bmw > gs': [['item6']]})
One option is with pivot_longer from pyjanitor, where for this particular use case, you pass a separator to names_sep
to split the columns and pass a list of new column labels to names_to
, before exploding
the values_to
column:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = None,
names_to = ('category1','category2','category3'),
names_sep = ' > ',
values_to = 'item')
.explode('item')
.sort_values('item') # not necessary
)
category1 category2 category3 item
0 car audi a4 item1
1 car bmw 3er item1
0 car audi a4 item2
0 car audi a4 item3
1 car bmw 3er item4
1 car bmw 3er item5
2 moto bmw gs item6
I have a pandas dataframe with 1 row and values in columns by separated by categories
car > audi > a4 | car > bmw > 3er | moto > bmw > gs |
---|---|---|
[item1, item2, item3] | [item1, item4, item5] | [item6] |
and I would like to create structure something like this:
item | category 1 | category 2 | category 3 |
---|---|---|---|
item 1 | car | audi | a4 |
item 1 | car | bmw | 3er |
item 2 | car | audi | a4 |
item 3 | car | audi | a4 |
item 4 | car | bmw | 3er |
item 5 | car | bmw | 3er |
item 6 | moto | bmw | gs |
What is the best solution?
You can use the explode
function that is a pandas built-in.
You can use:
(df.set_axis(df.columns.str.split('s*>s*', expand=True), axis=1)
.loc[0].explode()
.reset_index(name='item')
.rename(columns=lambda x: x.replace('level_', 'category'))
)
Output:
category0 category1 category2 item
0 car audi a4 item1
1 car audi a4 item2
2 car audi a4 item3
3 car bmw 3er item1
4 car bmw 3er item4
5 car bmw 3er item5
6 moto bmw gs item6
Used input:
df = pd.DataFrame({'car > audi > a4': [['item1', 'item2', 'item3']],
'car > bmw > 3er': [['item1', 'item4', 'item5']],
'moto > bmw > gs': [['item6']]})
One option is with pivot_longer from pyjanitor, where for this particular use case, you pass a separator to names_sep
to split the columns and pass a list of new column labels to names_to
, before exploding
the values_to
column:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = None,
names_to = ('category1','category2','category3'),
names_sep = ' > ',
values_to = 'item')
.explode('item')
.sort_values('item') # not necessary
)
category1 category2 category3 item
0 car audi a4 item1
1 car bmw 3er item1
0 car audi a4 item2
0 car audi a4 item3
1 car bmw 3er item4
1 car bmw 3er item5
2 moto bmw gs item6