Reshape wide to long in pandas
Question:
Let’s assume that I have the following dataframe in pandas:
AA BB CC
date
05/03 1 2 3
06/03 4 5 6
07/03 7 8 9
08/03 5 7 1
and I want to transform it to the following:
AA 05/03 1
AA 06/03 4
AA 07/03 7
AA 08/03 5
BB 05/03 2
BB 06/03 5
BB 07/03 8
BB 08/03 7
CC 05/03 3
CC 06/03 6
CC 07/03 9
CC 08/03 1
How can I do it?
The reason of the transformation from wide to long is that, in the next stage, I would like to merge this dataframe with another one, based on dates and the initial column names (AA, BB, CC).
Answers:
Update
As George Liu has shown in another answer, pd.melt
is the idiomatic, flexible and fast solution to this problem. Do not use unstack
for this.
unstack
returns a series with a multiindex:
In [38]: df.unstack()
Out[38]:
date
AA 05/03 1
06/03 4
07/03 7
08/03 5
BB 05/03 2
06/03 5
07/03 8
08/03 7
CC 05/03 3
06/03 6
07/03 9
08/03 1
dtype: int64
You can call reset_index on the returning series:
In [39]: df.unstack().reset_index()
Out[39]:
level_0 date 0
0 AA 05-03 1
1 AA 06-03 4
2 AA 07-03 7
3 AA 08-03 5
4 BB 05-03 2
5 BB 06-03 5
6 BB 07-03 8
7 BB 08-03 7
8 CC 05-03 3
9 CC 06-03 6
10 CC 07-03 9
11 CC 08-03 1
Or construct a dataframe with a multiindex:
In [40]: pd.DataFrame(df.unstack())
Out[40]:
0
date
AA 05-03 1
06-03 4
07-03 7
08-03 5
BB 05-03 2
06-03 5
07-03 8
08-03 7
CC 05-03 3
06-03 6
07-03 9
08-03 1
Use pandas.melt
or pandas.DataFrame.melt
to transform from wide to long:
df = pd.DataFrame({
'date' : ['05/03', '06/03', '07/03', '08/03'],
'AA' : [1, 4, 7, 5],
'BB' : [2, 5, 8, 7],
'CC' : [3, 6, 9, 1]
}).set_index('date')
df
AA BB CC
date
05/03 1 2 3
06/03 4 5 6
07/03 7 8 9
08/03 5 7 1
To convert, we just need to reset the index and then melt:
df = df.reset_index()
pd.melt(df, id_vars='date', value_vars=['AA', 'BB', 'CC'])
Using .reset_index
after .melt
, removes the need to specify value_vars
.
dfm = df.melt(ignore_index=False).reset_index()
Final Result – both options
date variable value
0 05/03 AA 1
1 06/03 AA 4
2 07/03 AA 7
3 08/03 AA 5
4 05/03 BB 2
5 06/03 BB 5
6 07/03 BB 8
7 08/03 BB 7
8 05/03 CC 3
9 06/03 CC 6
10 07/03 CC 9
11 08/03 CC 1
Apart from unstack
and melt
, stack
can also be used here.
df1 = df.stack().reset_index(name='value')
# change "weird" column label
df1 = df.stack().reset_index(name='value').rename(columns={'level_1': 'variable'})
All of melt
, stack
and unstack
are very fast methods, so runtime differences will hardly matter in normal circumstances. If runtime is such a concern, a numpy based solution could also be used (that is about 50% faster than melt
). The idea is to simply flatten the values in the frame into a 1D array and repeat the index and column labels along with it.
df1 = pd.DataFrame({ 'variable': np.tile(df.columns, len(df)), 'date': df.index.repeat(df.shape[1]), 'value': df.values.ravel()})
If column labels are not needed as a separate column, then another really fast function is pd.lreshape
.
df1 = pd.lreshape(df.reset_index(), {'value': ['AA', 'BB', 'CC']})
Let’s assume that I have the following dataframe in pandas:
AA BB CC
date
05/03 1 2 3
06/03 4 5 6
07/03 7 8 9
08/03 5 7 1
and I want to transform it to the following:
AA 05/03 1
AA 06/03 4
AA 07/03 7
AA 08/03 5
BB 05/03 2
BB 06/03 5
BB 07/03 8
BB 08/03 7
CC 05/03 3
CC 06/03 6
CC 07/03 9
CC 08/03 1
How can I do it?
The reason of the transformation from wide to long is that, in the next stage, I would like to merge this dataframe with another one, based on dates and the initial column names (AA, BB, CC).
Update
As George Liu has shown in another answer, pd.melt
is the idiomatic, flexible and fast solution to this problem. Do not use unstack
for this.
unstack
returns a series with a multiindex:
In [38]: df.unstack()
Out[38]:
date
AA 05/03 1
06/03 4
07/03 7
08/03 5
BB 05/03 2
06/03 5
07/03 8
08/03 7
CC 05/03 3
06/03 6
07/03 9
08/03 1
dtype: int64
You can call reset_index on the returning series:
In [39]: df.unstack().reset_index()
Out[39]:
level_0 date 0
0 AA 05-03 1
1 AA 06-03 4
2 AA 07-03 7
3 AA 08-03 5
4 BB 05-03 2
5 BB 06-03 5
6 BB 07-03 8
7 BB 08-03 7
8 CC 05-03 3
9 CC 06-03 6
10 CC 07-03 9
11 CC 08-03 1
Or construct a dataframe with a multiindex:
In [40]: pd.DataFrame(df.unstack())
Out[40]:
0
date
AA 05-03 1
06-03 4
07-03 7
08-03 5
BB 05-03 2
06-03 5
07-03 8
08-03 7
CC 05-03 3
06-03 6
07-03 9
08-03 1
Use pandas.melt
or pandas.DataFrame.melt
to transform from wide to long:
df = pd.DataFrame({
'date' : ['05/03', '06/03', '07/03', '08/03'],
'AA' : [1, 4, 7, 5],
'BB' : [2, 5, 8, 7],
'CC' : [3, 6, 9, 1]
}).set_index('date')
df
AA BB CC
date
05/03 1 2 3
06/03 4 5 6
07/03 7 8 9
08/03 5 7 1
To convert, we just need to reset the index and then melt:
df = df.reset_index()
pd.melt(df, id_vars='date', value_vars=['AA', 'BB', 'CC'])
Using .reset_index
after .melt
, removes the need to specify value_vars
.
dfm = df.melt(ignore_index=False).reset_index()
Final Result – both options
date variable value
0 05/03 AA 1
1 06/03 AA 4
2 07/03 AA 7
3 08/03 AA 5
4 05/03 BB 2
5 06/03 BB 5
6 07/03 BB 8
7 08/03 BB 7
8 05/03 CC 3
9 06/03 CC 6
10 07/03 CC 9
11 08/03 CC 1
Apart from unstack
and melt
, stack
can also be used here.
df1 = df.stack().reset_index(name='value')
# change "weird" column label
df1 = df.stack().reset_index(name='value').rename(columns={'level_1': 'variable'})
All of melt
, stack
and unstack
are very fast methods, so runtime differences will hardly matter in normal circumstances. If runtime is such a concern, a numpy based solution could also be used (that is about 50% faster than melt
). The idea is to simply flatten the values in the frame into a 1D array and repeat the index and column labels along with it.
df1 = pd.DataFrame({ 'variable': np.tile(df.columns, len(df)), 'date': df.index.repeat(df.shape[1]), 'value': df.values.ravel()})
If column labels are not needed as a separate column, then another really fast function is pd.lreshape
.
df1 = pd.lreshape(df.reset_index(), {'value': ['AA', 'BB', 'CC']})