Add zero vector row per group in pandas

Question:

I want to create equal sized numpy (padded) array from pandas, ultimately to be given as input to keras model

import pandas as pd
df = pd.DataFrame([[1, 1.2, 2.2], 
                   [1, 3.2, 4.6],
                   [2, 5.5, 6.6]], columns = ['id', 'X1', 'X2']
                 )
df
>> 
   id   X1   X2
0   1   1.2  2.2
1   1   3.2  4.6
2   2   5.5  6.6

Expected Output – 3d numpy array with padding

array[
        [
          [1.2, 2.2],
          [3.2, 4.6]
        ],
        [
          [5.5, 6.6],
          [0,   0]
        ]
     ]

Can anyone help me?

Asked By: Hardik Gupta

||

Answers:

Use DataFrame.reindex with counter by GroupBy.cumcount for append zero rows first:

df['g'] = df.groupby('id').cumcount()

ids = df['id'].unique()
maxg = df['g'].max()+1
df1 = (df.set_index(['id','g'])
          .reindex(pd.MultiIndex.from_product([ids, np.arange(maxg)]), fill_value=0))
print (df1)
      X1   X2
1 0  1.2  2.2
  1  3.2  4.6
2 0  5.5  6.6
  1  0.0  0.0

And then convert values to numpy arrays and reshape to 3d:

a = df1.to_numpy().reshape(len(ids), maxg, len(df1.columns))
print (a)
[[[1.2 2.2]
  [3.2 4.6]]

 [[5.5 6.6]
  [0.  0. ]]]

Alternative solution:

df['g'] = df.groupby('id').cumcount()

df1 = (df.set_index(['id','g']).unstack(fill_value=0)
         .sort_index(axis=1, level=1, sort_remaining=False))
print (df1)
     X1   X2   X1   X2
g     0    0    1    1
id                    
1   1.2  2.2  3.2  4.6
2   5.5  6.6  0.0  0.0

ids = df['id'].unique()
maxg = df['g'].max()+1

a = df1.to_numpy().reshape(len(ids),maxg, len(df1.columns) // maxg)
print (a)
[[[1.2 2.2]
  [3.2 4.6]]

 [[5.5 6.6]
  [0.  0. ]]]
Answered By: jezrael
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.