How can I "unpivot" specific columns from a pandas DataFrame?

Question:

I have a pandas DataFrame, eg:

df = pd.DataFrame({'farm' : ['A','B','A','B'], 
                   'fruit':['apple','apple','pear','pear'], 
                   '2014':[10,12,6,8], 
                   '2015':[11,13,7,9]})

ie:

   2014  2015 farm  fruit
0    10    11    A  apple
1    12    13    B  apple
2     6     7    A   pear
3     8     9    B   pear

How can I convert it to the following?

  farm  fruit  value  year
0    A  apple     10  2014
1    B  apple     12  2014
2    A   pear      6  2014
3    B   pear      8  2014
4    A  apple     11  2015
5    B  apple     13  2015
6    A   pear      7  2015
7    B   pear      9  2015

I have tried stack and unstack but haven’t been able to make it work.

Asked By: Racing Tadpole

||

Answers:

This can be done with pd.melt():

# value_name is 'value' by default, but setting it here to make it clear
pd.melt(x, id_vars=['farm', 'fruit'], var_name='year', value_name='value')

Result:

  farm  fruit  year  value
0    A  apple  2014     10
1    B  apple  2014     12
2    A   pear  2014      6
3    B   pear  2014      8
4    A  apple  2015     11
5    B  apple  2015     13
6    A   pear  2015      7
7    B   pear  2015      9

[8 rows x 4 columns]

I’m not sure how common “melt” is as the name for this kind of operation, but that’s what it’s called in R’s reshape2 package, which probably inspired the name here.

Answered By: Marius

It can be done using stack(); just that set_index() has to be called first to repeat farm and fruit for each year-value pair.

long_df = df.set_index(['farm', 'fruit']).rename_axis(columns='year').stack().reset_index(name='value')

result1

Also melt is a DataFrame method as well, so it can be called like:

long_df = df.melt(id_vars=['farm', 'fruit'], var_name='year', value_name='value')

One interesting function is pd.wide_to_long which can also be used to "melt" a frame. However, it requires a stubname, so wouldn’t work for the case in the OP but works for other cases. For example, in the case below (note how years in the column labels have value_ in it).

long_df = pd.wide_to_long(df, 'value', i=['farm', 'fruit'], j='year', sep='_').reset_index()

result2

Answered By: cottontail
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.