Take rows of ith duplicated index and put in i number of dataframes

Question:

This is a bit tricky to put into words, but I’ll give it a try. I have a dataframe with duplicated indices as provided below.

a = [0.00000, 0.071928, 1.294, 2.592563, 0.000318, 2.575291, 0.439986, 2.232147, 6.091523, 2.075441, 0.96152]
b = [0.00000, 0.399791, 1.302446, 1.388957, 1.276451, 1.527568, 1.614107, 2.686325, 4.167600, 6.135689, 5.945807]

df = pd.DataFrame({'a' : a, 'b' : b})
df.index = [1,1,1,1,1,2,2,3,3,3,4]

I want the row of the first duplicated index for every number to be appended to df1, and the row of the second duplicated index to be appended to df2, etc; the first time indices 1, 2, 3, 4… n have a duplicate, those rows get appended to dataframe 1. The second time indices 1, 2, 3, 4…n have a duplicate, those rows get appended to dataframe 2, and so on. Ideally, it would look something like this if concatenated for the first three duplicates under the ‘index’ column:

Any idea how to go about this? I’ve tried to run df[df.duplicated(subset = [‘index’])] in a for loop to widdle down the df to the very first duplicates, but it doesn’t seem to work the way I think it will.

Asked By: Volti

||

Answers:

Slicing out the duplicate indices via cumcount and using concat to stitch together the resulting sub-dataframes will do the job.

cols = df.columns
df['id'] = df.index

pd.concat([df[df.groupby('id').cumcount()==i][cols] for i in range(0, max(df.groupby('id').cumcount().values))], axis=1)
Answered By: 7shoe
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.