Python Pandas – How to create a dataframe from a sequence

Question:

I’m trying to create a dataframe populated by repeating rows based on an existing steady sequence.
For example, if I had a sequence increasing in 3s from 6 to 18, the sequence could be generated using np.arange(6, 18, 3) to give array([ 6, 9, 12, 15]).

How would I go about generating a dataframe in this way?

How could I get the below if I wanted 6 repeated rows?

     0   1   2    3
0   6.0 9.0 12.0 15.0   
1   6.0 9.0 12.0 15.0   
2   6.0 9.0 12.0 15.0   
3   6.0 9.0 12.0 15.0   
4   6.0 9.0 12.0 15.0
5   6.0 9.0 12.0 15.0
6   6.0 9.0 12.0 15.0

The reason for creating this matrix is that I then wish to add a pd.sequence row-wise to this matrix

Asked By: AlwaysInTheDark

||

Answers:

pd.DataFrame([np.arange(6, 18, 3)]*7)

alternately,


pd.DataFrame(np.repeat([np.arange(6, 18, 3)],7, axis=0))
    0   1   2   3
0   6   9   12  15
1   6   9   12  15
2   6   9   12  15
3   6   9   12  15
4   6   9   12  15
5   6   9   12  15
6   6   9   12  15
Answered By: Naveed

Here is a solution using NumPy broadcasting which avoids Python loops, lists, and excessive memory allocation (as done by np.repeat):

pd.DataFrame(np.broadcast_to(np.arange(6, 18, 3), (6, 4))) 

To understand why this is more efficient than other solutions, refer to the np.broadcast_to() docs: https://numpy.org/doc/stable/reference/generated/numpy.broadcast_to.html

more than one element of a broadcasted array may refer to a single memory location.

This means that no matter how many rows you create before passing to Pandas, you’re only really allocating a single row, then a 2D array which refers to the data of that row multiple times.

If you assign the above to df, you can say df.values.base is a single row–this is the only storage required no matter how many rows appear in the DataFrame.

Answered By: John Zwinck
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.