Convert rows containing certain characters to columns in Python


I’ve a pandas dataframe which contains data like this:-

name           value
Data size      Building name
Data size      Empire State
Data size      Petronas Tower
Data size      Eiffel Tower
Data size      USA
Data size      20
Data size      24
Data size      32
Data size      Brazil
Data size      38
Data size      42
Data size      87
Data size      France
Data size      37
Data size      43
Data size      18 

I want to convert this data into this form:-

Building Name       Country Name       Data Sizes
Empire State        USA                 20
Empire State        Brazil              38
Empire State        France              37
Petronas Tower      USA                 24
Petronas Tower      Brazil              42
Petronas Tower      France              43
Eiffel Tower        USA                 32
Eiffel Tower        Brazil              87
Eiffel Tower        France              18

I tried unstack() method but it was of no avail.
It’d be great if someone could help me figure this out.

Asked By: vesuvius



The expected logic is unclear, but assuming you know the number of buildings (or of countries), you can reshape your data with numpy and melt:

# number of buildings
N = 3

a = df['value'].to_numpy().reshape((N+1, -1), order='F')

out = (pd.DataFrame(a[1:], columns=a[0])
         .melt('Building name', var_name='Country Name', value_name='Data Sizes')


    Building name Country Name Data Sizes
0    Empire State          USA         20
1  Petronas Tower          USA         24
2    Eiffel Tower          USA         32
3    Empire State       Brazil         38
4  Petronas Tower       Brazil         42
5    Eiffel Tower       Brazil         87
6    Empire State       France         37
7  Petronas Tower       France         43
8    Eiffel Tower       France         18

For the desired order with unstack:

N = 3
a = df['value'].to_numpy().reshape((N+1, -1))

out = (pd.DataFrame(a[1:], columns=a[0]).set_index('Building name')
         .rename_axis(columns='Country Name')
         .unstack().reset_index(name='Data Sizes')


     Country Name Building name Data Sizes
0    Empire State           USA         20
1    Empire State        Brazil         38
2    Empire State        France         37
3  Petronas Tower           USA         24
4  Petronas Tower        Brazil         42
5  Petronas Tower        France         43
6    Eiffel Tower           USA         32
7    Eiffel Tower        Brazil         87
8    Eiffel Tower        France         18
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.