Multiply pandas data-frame to a fixed number of rows

Question:

I have a data-frame. I want to multiply (essentially duplicate the data-frame) to a fixed number of target rows.

df:

col1    col2    col3
A1      B1      C1
A13     B13     C13
A27     B27     C27

I want to duplicate this data-frame so that the resulting data-frame should have 10 rows, Essentially each row should be multiplied three times and the 10th row could be any one of the three rows.

Asked By: msksantosh

||

Answers:

I think need divmod for repeat all rows and for repeat only one:

N = 10

a, b = divmod(N,len(df))
print (a, b)
3 1

Solution if all columns have same dtypes with numpy.repeat:

c = np.repeat(df.values, a, axis=0)
d = np.repeat(df.values[-1], b, axis=0)

df = pd.DataFrame(np.vstack((c,d)), columns=df.columns)
print (df)
  col1 col2 col3
0   A1   B1   C1
1   A1   B1   C1
2   A1   B1   C1
3  A13  B13  C13
4  A13  B13  C13
5  A13  B13  C13
6  A27  B27  C27
7  A27  B27  C27
8  A27  B27  C27
9  A27  B27  C27

Solutions if possible different dtypes:

Only pandas solution with concat:

df = pd.concat([df] * a + [df.iloc[[-1]]] * b).sort_values('col1').reset_index(drop=True)
print (df)
  col1 col2 col3
0   A1   B1   C1
1   A1   B1   C1
2   A1   B1   C1
3  A13  B13  C13
4  A13  B13  C13
5  A13  B13  C13
6  A27  B27  C27
7  A27  B27  C27
8  A27  B27  C27
9  A27  B27  C27

Solution with repeat only indices and loc for repeat rows:

idx = np.hstack((np.repeat(df.index[:-1], a), np.repeat(df.index[-1], a + b)))
df = df.loc[idx].reset_index(drop=True)
Answered By: jezrael

Another solution, which answer partially your question but might be helpful for others:

N = 200000
big_df = pd.DataFrame(df.to_dict(orient="records") * N)
Answered By: RedTomato
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.