Replicating rows in a pandas data frame by a column value

Question

I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.

import pandas as pd

what_i_have = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'n' : [  1,   2,   3],
  'v' : [ 10,  13,   8]
})

what_i_want = pd.DataFrame(data={
  'id': ['A', 'B', 'B', 'C', 'C', 'C'],
  'v' : [ 10,  13,  13,   8,   8,   8]
})

Is this possible?

Asked By: Mersenne Prime

||

Source

Answer 1

You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:

df2 = df.loc[df.index.repeat(df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

Or you could use np.repeat to get the repeated indices and then use that to index into the frame:

df2 = df.loc[np.repeat(df.index.values, df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

After which there’s only a bit of cleaning up to do:

df2 = df2.drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

Note that if you might have duplicate indices to worry about, you could use .iloc instead:

df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

which uses the positions, and not the index labels.

Answered By: DSM

Answer 2

You could use set_index and repeat

In [1057]: df.set_index(['id'])['v'].repeat(df['n']).reset_index()
Out[1057]:
  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

Details

In [1058]: df
Out[1058]:
  id  n   v
0  A  1  10
1  B  2  13
2  C  3   8

Answered By: Zero

Answer 3

Not the best solution, but I want to share this: you could also use pandas.reindex() and .repeat():

df.reindex(df.index.repeat(df.n)).drop('n', axis=1)

Output:

You can further append .reset_index(drop=True) to reset the .index.

Answered By: Chenglong Ma

Answer 4

It’s something like the uncount in tidyr:

https://tidyr.tidyverse.org/reference/uncount.html

I wrote a package (https://github.com/pwwang/datar) that implements this API:

from datar import f
from datar.tibble import tribble
from datar.tidyr import uncount

what_i_have = tribble(
    f.id, f.n, f.v,
    'A',  1,   10,
    'B',  2,   13,
    'C',  3,   8
)
what_i_have >> uncount(f.n)

Output:

Answered By: Panwen Wang

Replicating rows in a pandas data frame by a column value

Question:

Answers: