Dataframe pivot when indexes are duplicated

Question:

What Python code will get me from Original DataFrame to Final DataFrame? I am struggling to find any pandas functions for this type of a pivot.

Original DataFrame:

    test1  test2 class value1
0       1      5  type  type1
1       1      5  type  type3
2       2      6  type  type1
3       3      7  type  type2
4       1      5   lat     40
5       1      5   lat     20
6       2      6   lat     50
7       3      7   lat     60
8       1      5   lon      1
9       1      5   lon      2
10      2      6   lon      2
11      3      7   lon      3

Final DataFrame:

   test1  test2   type  lat  lon
0      1      5  type1   40    1
1      2      6  type1   50    2
2      3      7  type2   60    3
3      1      5  type3   20    2 

I tired using the pandas pivot function, but get the following error.

df_pivoted = df_orig.pivot(
    index=['test1', 'test2'],
    columns='class',
    values='value1')
ValueError: Index contains duplicate entries, cannot reshape
Asked By: Adam J. Berlier

||

Answers:

De-duplicate the rows before pivot:

cols = ['test1', 'test2']
out = (
 df.assign(n=df.groupby(cols+['class']).cumcount())
   .pivot(index=['n']+cols, columns='class', values='value1')
   .reset_index().rename_axis(columns=None).drop(columns='n')
)

Output:

   test1  test2 lat lon   type
0      1      5  40   1  type1
1      2      6  50   2  type1
2      3      7  60   3  type2
3      1      5  20   2  type3

previous answer (before question edit)

This is not at all a simple pivot, there is no direct relationship between test1/test2 and class/value, you must handle both parts separately and join:

cols = ['test1', 'test2']

g = df_orig.groupby('class', sort=False)

out = (df_orig[cols]
       .assign(n=g.ngroup())
       .drop_duplicates('n').set_index('n')
       .join((df_orig.drop(columns=cols).assign(n=g.cumcount())
             .pivot(index='n', columns='class', values='value1')
              ))
       .reset_index(drop=True)
      )

print(out)

Output:

   test1  test2 lat lon   type
0      1      5  40   1  type1
1      2      6  50   2  type1
2      3      7  60   3  type2

simple pivot

For a simple pivot your input should look like:

   test1  test2 class value1
0      1      5  type  type1
1      2      6  type  type1
2      3      7  type  type2
3      1      5   lat     40
4      2      6   lat     50
5      3      7   lat     60
6      1      5   lon      1
7      2      6   lon      2
8      3      7   lon      3

And you could use:

df.pivot(index=['test1', 'test2'], columns='class', values='value1').reset_index()
Answered By: mozway
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.