Pandas: How can I move certain columns into rows?

Question:

Suppose I have the df below. I would like to combine the price columns and value columns so that all prices are in one column and all volumes are in another column. I would also like a third column that identified the price level. For example, unit1, unit2 and unit3.

import numpy as np
import pandas as pd
df = pd.DataFrame(
    {
        'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
        'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
        'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
        'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
        'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
        'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
        'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
        'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
    }
)
df

df1

This is what the final df should look like:

df_final


I tried using melt and I think almost have the right answer.

pd.melt(df, id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit1_vol', 'unit2_price', 'unit2_vol', 'unit3_price', 'unit3_vol'])

This is what the partial df looks like with melt:

df2

The problem with the above is that volume and price are in the same column but I want them to be in 2 separate columns.

Did I use the right function?

Asked By: IamWarmduscher

||

Answers:

Try with melt , then pivot after split

s = df.melt(['uid','location'])
s[['unit','type']] = 

s['variable'].str.split('_',expand=True)
s = s.pivot(index = ['uid','location','unit'],columns = ['type'],values = 'value').reset_index()
s
Out[967]: 
type   uid location   unit  price  vol
0     A100     Asia  unit1     10  150
1     A100     Asia  unit2     10  200
2     A100     Asia  unit3     20  500
3     A200     Asia  unit1     20  100
4     A200     Asia  unit2     10  150
Answered By: BENY

maybe this:

import pandas as pd
df = pd.DataFrame(
    {
        'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
        'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
        'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
        'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
        'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
        'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
        'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
        'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
    }
)

price = pd.melt(
    df, id_vars=['uid', 'location', 'unit2_vol', 'unit1_vol', 'unit3_vol'], value_vars=['unit1_price', 'unit3_price', 'unit2_price'], var_name="price", value_name="price_value"
)

res = pd.melt(
    price, id_vars=['uid', 'location', 'price', 'price_value'], value_vars=['unit2_vol', 'unit1_vol', 'unit3_vol'], var_name="vol", value_name="vol_value"
)
print(res)

OUTPUT:

     uid location        price  price_value        vol  vol_value
0   U100       US  unit1_price           10  unit2_vol        200
1   U200       US  unit1_price           20  unit2_vol        200
2   E100       EU  unit1_price           15  unit2_vol        150
3   E200       EU  unit1_price           10  unit2_vol        300
4   E300       EU  unit1_price           10  unit2_vol        300
..   ...      ...          ...          ...        ...        ...
85  A100     Asia  unit2_price           10  unit3_vol        500
86  A200     Asia  unit2_price           10  unit3_vol        500
87  A300     Asia  unit2_price           10  unit3_vol        500
88  A400     Asia  unit2_price           20  unit3_vol        500
89  A500     Asia  unit2_price           20  unit3_vol        500
Answered By: maya

You can form two dataframe using pd.melt first and combine it back to become one dataframe.

df1 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit2_price', 'unit3_price'],var_name='unit',value_name='price')

df2 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_vol', 'unit2_vol', 'unit3_vol'],var_name='unit', value_name="volume")

ddf = pd.concat([df1,df2['volume']],axis=1).sort_values(by=['uid','unit'],ignore_index=True)

ddf['unit']=ddf['unit'].str.split('_',expand=True)[0]

You can do:

df_price = df.set_index(['uid','location']).filter(
    regex='price$').stack().rename_axis(
    ['uid', 'location', 'price_unit']).rename('price').reset_index()

df_vol = df.filter(regex='vol$').stack().rename_axis(
    ['', 'vol_unit']).rename('volume').reset_index(level=1).reset_index(drop=True)

df2 = pd.concat([df_price, df_vol], axis=1)
df2['unit'] = df2['price_unit'].apply(lambda x:x.split('_')[0])
df2.drop(['price_unit', 'vol_unit'],axis=1, inplace=True)

print(df2):

     uid location  price  volume   unit
0   U100       US     10     100  unit1
1   U100       US     10     200  unit2
2   U100       US      0       0  unit3
3   U200       US     20     150  unit1
4   U200       US     25     200  unit2
5   U200       US      0       0  unit3
6   E100       EU     15     100  unit1
7   E100       EU     30     150  unit2
8   E100       EU      0       0  unit3
9   E200       EU     10     200  unit1
10  E200       EU     20     300  unit2
11  E200       EU     20     500  unit3
12  E300       EU     10     150  unit1
13  E300       EU     10     300  unit2
14  E300       EU     20     500  unit3
15  A100     Asia     10     150  unit1
16  A100     Asia     10     200  unit2
17  A100     Asia     20     500  unit3
18  A200     Asia     20     100  unit1
19  A200     Asia     10     150  unit2
20  A200     Asia     20     500  unit3
21  A300     Asia     20     200  unit1
22  A300     Asia     10     225  unit2
23  A300     Asia     20     500  unit3
24  A400     Asia     25     200  unit1
25  A400     Asia     20     225  unit2
26  A400     Asia     20     500  unit3
27  A500     Asia     25     200  unit1
28  A500     Asia     20     250  unit2
29  A500     Asia     20     500  unit3
Answered By: SomeDude

you can do the transformation efficiently in one step from wide to long, using pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
     index = ['uid', 'location'], 
     names_to = ('unit', '.value'), 
     names_sep = '_')
)
     uid location   unit  price  vol
0   U100       US  unit1     10  100
1   U200       US  unit1     20  150
2   E100       EU  unit1     15  100
3   E200       EU  unit1     10  200
4   E300       EU  unit1     10  150
5   A100     Asia  unit1     10  150
6   A200     Asia  unit1     20  100
7   A300     Asia  unit1     20  200
8   A400     Asia  unit1     25  200
9   A500     Asia  unit1     25  200
10  U100       US  unit2     10  200
11  U200       US  unit2     25  200
12  E100       EU  unit2     30  150
13  E200       EU  unit2     20  300
14  E300       EU  unit2     10  300
15  A100     Asia  unit2     10  200
16  A200     Asia  unit2     10  150
17  A300     Asia  unit2     10  225
18  A400     Asia  unit2     20  225
19  A500     Asia  unit2     20  250
20  U100       US  unit3      0    0
21  U200       US  unit3      0    0
22  E100       EU  unit3      0    0
23  E200       EU  unit3     20  500
24  E300       EU  unit3     20  500
25  A100     Asia  unit3     20  500
26  A200     Asia  unit3     20  500
27  A300     Asia  unit3     20  500
28  A400     Asia  unit3     20  500
29  A500     Asia  unit3     20  500

.value determines which part of the column remains as a header ( in this case it is price and vol), while the part not associated with .value goes into the unit column. names_sep helps split the labels.

Another option is with pd.wide_to_long, which also allows the reshaping in a single step. It does require some munging on the columns:

i = ['uid', 'location']

temp = df.set_index(i)

# reshape the columns, moving price and vol to the front
temp.columns = temp.columns.str.split('_').str[::-1].str.join('_')

(pd
.wide_to_long(
    temp.reset_index(), 
    i = i, 
    j = 'unit', 
    stubnames = ['price', 'vol'], 
    sep='_', 
    suffix = '.+')
.reset_index()
)
     uid location   unit  price  vol
0   U100       US  unit1     10  100
1   U100       US  unit2     10  200
2   U100       US  unit3      0    0
3   U200       US  unit1     20  150
4   U200       US  unit2     25  200
5   U200       US  unit3      0    0
6   E100       EU  unit1     15  100
7   E100       EU  unit2     30  150
8   E100       EU  unit3      0    0
9   E200       EU  unit1     10  200
10  E200       EU  unit2     20  300
11  E200       EU  unit3     20  500
12  E300       EU  unit1     10  150
13  E300       EU  unit2     10  300
14  E300       EU  unit3     20  500
15  A100     Asia  unit1     10  150
16  A100     Asia  unit2     10  200
17  A100     Asia  unit3     20  500
18  A200     Asia  unit1     20  100
19  A200     Asia  unit2     10  150
20  A200     Asia  unit3     20  500
21  A300     Asia  unit1     20  200
22  A300     Asia  unit2     10  225
23  A300     Asia  unit3     20  500
24  A400     Asia  unit1     25  200
25  A400     Asia  unit2     20  225
26  A400     Asia  unit3     20  500
27  A500     Asia  unit1     25  200
28  A500     Asia  unit2     20  250
29  A500     Asia  unit3     20  500

You can even use the stack option, again all these options are just for you to do the transformation efficiently:

i = ['uid', 'location']

temp = df.set_index(i)

# create a MultiIndex
temp.columns = temp.columns.str.split('_', expand = True)
temp.columns.names = ['unit', None]

temp.stack('unit').reset_index()

     uid location   unit  price  vol
0   U100       US  unit1     10  100
1   U100       US  unit2     10  200
2   U100       US  unit3      0    0
3   U200       US  unit1     20  150
4   U200       US  unit2     25  200
5   U200       US  unit3      0    0
6   E100       EU  unit1     15  100
7   E100       EU  unit2     30  150
8   E100       EU  unit3      0    0
9   E200       EU  unit1     10  200
10  E200       EU  unit2     20  300
11  E200       EU  unit3     20  500
12  E300       EU  unit1     10  150
13  E300       EU  unit2     10  300
14  E300       EU  unit3     20  500
15  A100     Asia  unit1     10  150
16  A100     Asia  unit2     10  200
17  A100     Asia  unit3     20  500
18  A200     Asia  unit1     20  100
19  A200     Asia  unit2     10  150
20  A200     Asia  unit3     20  500
21  A300     Asia  unit1     20  200
22  A300     Asia  unit2     10  225
23  A300     Asia  unit3     20  500
24  A400     Asia  unit1     25  200
25  A400     Asia  unit2     20  225
26  A400     Asia  unit3     20  500
27  A500     Asia  unit1     25  200
28  A500     Asia  unit2     20  250
29  A500     Asia  unit3     20  500
Answered By: sammywemmy

The existing answers all work great. Here’s another method using numpy. It works as long as the location of _price and _vol columns alternate exactly as it’s shown in the OP (because it relies on numpy.reshape to convert the wide data to long, so it doesn’t recognize column names while reshaping).

# filter columns containing price or vol
price_vol_cols = df.columns.str.contains('price|vol')
# the number of vol columns (and price columns)
width = price_vol_cols.sum()//2
# repeat uid and location columns
res = pd.DataFrame(np.tile(df.loc[:, ~price_vol_cols], (width, 1)), columns=df.columns[~price_vol_cols])
# repeat price and vol column names 
res['unit'] = np.repeat(df.columns[price_vol_cols].str.split('_').str[0], len(df)//2)
# reshape price and vol columns into 2 columns by stacking every 2 columns
res[['price', 'vol']] = np.vstack(df.loc[:, price_vol_cols].values.reshape(-1, width, 2).swapaxes(0,1))

enter image description here

Here is a solution that uses the fact that your data frame essentially has a MultiIndex column in hiding. Specifically, your unit1_price, unit_vol, etc. could be a MultiIndex with levels unit, and price, e.g. pd.MultiIndex.from_tuples([('unit1', 'price'), ('unit1', 'vol'), ('unit2', 'price'), ('unit2', 'vol'), ('unit3', 'price'), ('unit3', 'vol')], names=['unit', 'measure']). First, you need to move the uid and location columns to the index. With this structure, DataFrame.stack will do what you want.

import pandas as pd
df = pd.DataFrame(
    {
        'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
        'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
        'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
        'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
        'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
        'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
        'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
        'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
    }
)
df.set_index(["uid", "location"], inplace=True)
col_idx = pd.MultiIndex.from_tuples(
    [c.split("_") for c in df.columns],
    names=["unit", "measure"]
)
df.columns = col_idx
final_df = df.stack(level=0).reset_index()
print(final_df)

The result is

measure   uid location   unit  price  vol
0        U100       US  unit1     10  100
1        U100       US  unit2     10  200
2        U100       US  unit3      0    0
3        U200       US  unit1     20  150
4        U200       US  unit2     25  200
5        U200       US  unit3      0    0
6        E100       EU  unit1     15  100
7        E100       EU  unit2     30  150
8        E100       EU  unit3      0    0
9        E200       EU  unit1     10  200
10       E200       EU  unit2     20  300
11       E200       EU  unit3     20  500
12       E300       EU  unit1     10  150
13       E300       EU  unit2     10  300
14       E300       EU  unit3     20  500
15       A100     Asia  unit1     10  150
16       A100     Asia  unit2     10  200
17       A100     Asia  unit3     20  500
18       A200     Asia  unit1     20  100
19       A200     Asia  unit2     10  150
20       A200     Asia  unit3     20  500
21       A300     Asia  unit1     20  200
22       A300     Asia  unit2     10  225
23       A300     Asia  unit3     20  500
24       A400     Asia  unit1     25  200
25       A400     Asia  unit2     20  225
26       A400     Asia  unit3     20  500
27       A500     Asia  unit1     25  200
28       A500     Asia  unit2     20  250
29       A500     Asia  unit3     20  500
Answered By: ogdenkev
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.