Pandas Dataframe to Numpy Vstack Array by Unique Column Value

Question:

I have a dataframe with following structure:

import numpy as np
import pandas as pd

data = {'Group':['1', '1', '2', '2', '3', '3'], 'Value':[1, 2, 3, 4, 5, 6]} 
df = pd.DataFrame(data) 

I need to convert that dataframe (which has approx 4000 values per unique group, and 1000 groups) to a numpy array like the following one (order shall be preservered)

array([[1, 2],[3, 4],[5,6])

Additionaly:
99% percent of the groups have the same count of values, but some have different counts. If some padding would be possilbe to increase to the max. count, that would spare me lost data.

At the moment I iterate trough the uniqe ‘Group’ values and numpy.vstack them together. That is slow and far from elegant.

Asked By: Lageos

||

Answers:

This is just pivot:

(df.assign(col=df.groupby('Group').cumcount())
  .pivot(index='Group', columns='col', values='Value')
  .values
)

Output:

array([[1, 2],
       [3, 4],
       [5, 6]], dtype=int64)
Answered By: Quang Hoang
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.