Convert rows containing certain characters to columns in Python
Question:
I’ve a pandas dataframe which contains data like this:-
name value
Data size Building name
Data size Empire State
Data size Petronas Tower
Data size Eiffel Tower
Data size USA
Data size 20
Data size 24
Data size 32
Data size Brazil
Data size 38
Data size 42
Data size 87
Data size France
Data size 37
Data size 43
Data size 18
I want to convert this data into this form:-
Building Name Country Name Data Sizes
Empire State USA 20
Empire State Brazil 38
Empire State France 37
Petronas Tower USA 24
Petronas Tower Brazil 42
Petronas Tower France 43
Eiffel Tower USA 32
Eiffel Tower Brazil 87
Eiffel Tower France 18
I tried unstack()
method but it was of no avail.
It’d be great if someone could help me figure this out.
Answers:
The expected logic is unclear, but assuming you know the number of buildings (or of countries), you can reshape
your data with numpy and melt
:
# number of buildings
N = 3
a = df['value'].to_numpy().reshape((N+1, -1), order='F')
out = (pd.DataFrame(a[1:], columns=a[0])
.melt('Building name', var_name='Country Name', value_name='Data Sizes')
)
Output:
Building name Country Name Data Sizes
0 Empire State USA 20
1 Petronas Tower USA 24
2 Eiffel Tower USA 32
3 Empire State Brazil 38
4 Petronas Tower Brazil 42
5 Eiffel Tower Brazil 87
6 Empire State France 37
7 Petronas Tower France 43
8 Eiffel Tower France 18
For the desired order with unstack
:
N = 3
a = df['value'].to_numpy().reshape((N+1, -1))
out = (pd.DataFrame(a[1:], columns=a[0]).set_index('Building name')
.rename_axis(columns='Country Name')
.unstack().reset_index(name='Data Sizes')
)
Output:
Country Name Building name Data Sizes
0 Empire State USA 20
1 Empire State Brazil 38
2 Empire State France 37
3 Petronas Tower USA 24
4 Petronas Tower Brazil 42
5 Petronas Tower France 43
6 Eiffel Tower USA 32
7 Eiffel Tower Brazil 87
8 Eiffel Tower France 18
I’ve a pandas dataframe which contains data like this:-
name value
Data size Building name
Data size Empire State
Data size Petronas Tower
Data size Eiffel Tower
Data size USA
Data size 20
Data size 24
Data size 32
Data size Brazil
Data size 38
Data size 42
Data size 87
Data size France
Data size 37
Data size 43
Data size 18
I want to convert this data into this form:-
Building Name Country Name Data Sizes
Empire State USA 20
Empire State Brazil 38
Empire State France 37
Petronas Tower USA 24
Petronas Tower Brazil 42
Petronas Tower France 43
Eiffel Tower USA 32
Eiffel Tower Brazil 87
Eiffel Tower France 18
I tried unstack()
method but it was of no avail.
It’d be great if someone could help me figure this out.
The expected logic is unclear, but assuming you know the number of buildings (or of countries), you can reshape
your data with numpy and melt
:
# number of buildings
N = 3
a = df['value'].to_numpy().reshape((N+1, -1), order='F')
out = (pd.DataFrame(a[1:], columns=a[0])
.melt('Building name', var_name='Country Name', value_name='Data Sizes')
)
Output:
Building name Country Name Data Sizes
0 Empire State USA 20
1 Petronas Tower USA 24
2 Eiffel Tower USA 32
3 Empire State Brazil 38
4 Petronas Tower Brazil 42
5 Eiffel Tower Brazil 87
6 Empire State France 37
7 Petronas Tower France 43
8 Eiffel Tower France 18
For the desired order with unstack
:
N = 3
a = df['value'].to_numpy().reshape((N+1, -1))
out = (pd.DataFrame(a[1:], columns=a[0]).set_index('Building name')
.rename_axis(columns='Country Name')
.unstack().reset_index(name='Data Sizes')
)
Output:
Country Name Building name Data Sizes
0 Empire State USA 20
1 Empire State Brazil 38
2 Empire State France 37
3 Petronas Tower USA 24
4 Petronas Tower Brazil 42
5 Petronas Tower France 43
6 Eiffel Tower USA 32
7 Eiffel Tower Brazil 87
8 Eiffel Tower France 18