Pandas special pivot dataframe

Question:

Let’s take a sample dataframe :

df = pd.DataFrame({"Name": ["Alan","Alan","Kate","Kate","Brian"],
                   "Shop" :["A","B","C","A","B"],
                   "Amount":[4,2,1,3,5]})
    Name Shop  Amount
0   Alan    A       4
1   Alan    B       2
2   Kate    C       1
3   Kate    A       3
4  Brian    B       5

First expected output :

I would like to create a new dataframe from df having :

  • as columns all the possible values in the column Shop and the column Name
  • as index all the possible values in the column Shop, repeated for each value in column `Name’
  • as values the value in the column Ă€mount matching with the columns Name and Shop

Expected output :

   A  B  C   Name
A  4  2  0   Alan
B  4  2  0   Alan
C  4  2  0   Alan
A  3  0  1   Kate
B  3  0  1   Kate
C  3  0  1   Kate
A  0  5  0  Brian
B  0  5  0  Brian
C  0  5  0  Brian

Second expected output :

It’s almost the same as the first expected output. The only difference is that the value is the one that match with the index (and not column Name) and the column Shop.

Expected output :

   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian

Thanks to this post, I tried several scripts using pivot_table and pivot but I didn’t reach my expected outputs. Would you know please how to do ?

Asked By: Ewdlam

||

Answers:

First use DataFrame.pivot with repalce missing values to 0 and for same order like original values use DataFrame.reindex:

v = df['Shop'].unique()
df1 = (df.pivot('Name','Shop','Amount')
         .fillna(0)
         .astype('int')
         .reindex(df['Name'].unique()))

For repeat rows use Index.repeat with DataFrame.loc and set repeated indices by numpy.tile:

df = (df1.assign(Name=df1.index)
         .loc[df1.index.repeat(len(v))]
         .set_index(np.tile(v, len(df1.index)))
         .rename_axis(None, axis=1))
    
print (df)
   A  B  C   Name
A  4  2  0   Alan
B  4  2  0   Alan
C  4  2  0   Alan
A  3  0  1   Kate
B  3  0  1   Kate
C  3  0  1   Kate
A  0  5  0  Brian
B  0  5  0  Brian
C  0  5  0  Brian

For second output reshape by DataFrame.stack with numpy.broadcast_to:

s = df1.stack()
df2 = (pd.DataFrame(np.broadcast_to(s.to_numpy()[:, None], (len(s), len(v))),
                   columns=df1.columns, index= np.tile(df1.columns, len(df1.index)))
         .assign(Name=s.index.get_level_values(0)))

print (df2)
   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian

Alternative solution:

arr = np.ravel(df1)
df2 = (pd.DataFrame(np.broadcast_to(arr[:, None], (len(arr), len(v))),
                   columns=v, index= np.tile(v, len(v)))
          .assign(Name=np.repeat(df1.index, len(v)))
         )

print (df2)
   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian
Answered By: jezrael
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.