How to repeat a Pandas DataFrame?

Question:

This is my DataFrame that should be repeated for 5 times:

>>> x = pd.DataFrame({'a':1,'b':2}, index = range(1))
>>> x
   a  b
0  1  2

I want to have the result like this:

>>> x.append(x).append(x).append(x)
   a  b
0  1  2
0  1  2
0  1  2
0  1  2

But there must be a smarter way than appending 4 times. Actually the DataFrame I’m working on should be repeated 50 times.

I haven’t found anything practical, including those like np.repeat —- it just doesn’t work on a DataFrame.

Could anyone help?

Asked By: lsheng

||

Answers:

You can use the concat function:

In [13]: pd.concat([x]*5)
Out[13]: 
   a  b
0  1  2
0  1  2
0  1  2
0  1  2
0  1  2

If you only want to repeat the values and not the index, you can do:

In [14]: pd.concat([x]*5, ignore_index=True)
Out[14]: 
   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2
Answered By: joris

I would generally not repeat and/or append, unless your problem really makes it necessary – it is highly inefficiently and typically comes from not understanding the proper way to attack a problem.

I don’t know your exact use case, but if you have your values stored as

values = array(1, 2)
df2 = pd.DataFrame(index=arange(0,50),  columns=['a', 'b'])
df2[['a', 'b']] = values

will do the job. Perhaps you want to better explain what you’re trying to achieve?

Answered By: FooBar

Append should work too:

In [589]: x = pd.DataFrame({'a':1,'b':2},index = range(1))

In [590]: x
Out[590]: 
   a  b
0  1  2

In [591]: x.append([x]*5, ignore_index=True) #Ignores the index as per your need
Out[591]: 
   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2
5  1  2

In [592]: x.append([x]*5)
Out[592]: 
   a  b
0  1  2
0  1  2
0  1  2
0  1  2
0  1  2
0  1  2
Answered By: Surya

I think it’s cleaner/faster to use iloc nowadays:

In [11]: np.full(3, 0)
Out[11]: array([0, 0, 0])

In [12]: x.iloc[np.full(3, 0)]
Out[12]:
   a  b
0  1  2
0  1  2
0  1  2

More generally, you can use tile or repeat with arange:

In [21]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

In [22]: df
Out[22]:
   A  B
0  1  2
1  3  4

In [23]: np.tile(np.arange(len(df)), 3)
Out[23]: array([0, 1, 0, 1, 0, 1])

In [24]: np.repeat(np.arange(len(df)), 3)
Out[24]: array([0, 0, 0, 1, 1, 1])

In [25]: df.iloc[np.tile(np.arange(len(df)), 3)]
Out[25]:
   A  B
0  1  2
1  3  4
0  1  2
1  3  4
0  1  2
1  3  4

In [26]: df.iloc[np.repeat(np.arange(len(df)), 3)]
Out[26]:
   A  B
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4

Note: This will work with non-integer indexed DataFrames (and Series).

Answered By: Andy Hayden

Try using numpy.repeat:

>>> import numpy as np
>>> df = pd.DataFrame(np.repeat(x.to_numpy(), 5, axis=0), columns=x.columns)
>>> df
   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2
Answered By: U12-Forward

Apply by row-lambda is a universal approach in my opinion:

df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

df.apply(lambda row: row.repeat(2), axis=0) #.reset_index()

Out[1]: 
    A   B
0   1   2
0   1   2
1   3   4
1   3   4
Answered By: Alexey K.

Without numpy, we could also use Index.repeat + loc (or reindex):

out = x.loc[x.index.repeat(5)].reset_index(drop=True)

or

out = x.reindex(x.index.repeat(5)).reset_index(drop=True)

Output:

   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2
Answered By: user7864386