Copying and appending rows to a dataframe with increment to timestamp column by a minute

Question:

Here is the dataframe I have:

df = pd.DataFrame([[pd.Timestamp(2017, 1, 1, 12, 32, 0), 2, 3], 
               [pd.Timestamp(2017, 1, 2, 12, 32, 0), 4, 9]], 
               columns=['time', 'feature1', 'feature2'])

For every timestamp value found in the df (i.e for every value of the ‘time’ column), I need to append 5 more rows with the time column value of each row incremented by a minute successively, and the remaining columns values however will be copied as is.

So the output would look like:

time                  feature1   feature2
2017-01-01 12:32:00   2          3
2017-01-01 12:33:00   2          3
2017-01-01 12:34:00   2          3 
2017-01-01 12:35:00   2          3
2017-01-01 12:36:00   2          3
2017-01-01 12:37:00   2          3
2017-01-02 12:32:00   4          9
2017-01-02 12:33:00   4          9
2017-01-02 12:34:00   4          9
2017-01-02 12:35:00   4          9
2017-01-02 12:36:00   4          9
2017-01-02 12:37:00   4          9

As an elegant solution, I used df.asfreq('1min') function. But I could not tell it to stop after appending 5 rows! Instead it would keep appending rows with 1 min increments till it reached the next timestamp!

I tried the good old for loop in python and as expected it is very time consuming (I am dealing with 10 million rows).

I was hoping that there would be an elegant solution to this? Something that used functions like – df.asfreq('1min') but with a stop condition after appending 5 rows.

Asked By: Sushanth

||

Answers:

You can repeat the df and then do a groupby with cumcount and add the minutes like below:

out = df.loc[df.index.repeat(6)]
out['time'] = out['time'] + pd.to_timedelta(out.groupby("time").cumcount(),unit='m')

print(out)

                  time  feature1  feature2
0  2017-01-01 12:32:00         2         3
1  2017-01-01 12:33:00         2         3
2  2017-01-01 12:34:00         2         3
3  2017-01-01 12:35:00         2         3
4  2017-01-01 12:36:00         2         3
5  2017-01-01 12:37:00         2         3
6  2017-01-02 12:32:00         4         9
7  2017-01-02 12:33:00         4         9
8  2017-01-02 12:34:00         4         9
9  2017-01-02 12:35:00         4         9
10 2017-01-02 12:36:00         4         9
11 2017-01-02 12:37:00         4         9
Answered By: anky

You could create a column containing a list of required times using pandas.date_range and explode the DataFrame on that column:

df["time"] = df["time"].apply(lambda x: pd.date_range(start=x, periods=6, freq="1min"))
df = df.explode("time")

>>> df
                 time  feature1  feature2
0 2017-01-01 12:32:00         2         3
0 2017-01-01 12:33:00         2         3
0 2017-01-01 12:34:00         2         3
0 2017-01-01 12:35:00         2         3
0 2017-01-01 12:36:00         2         3
0 2017-01-01 12:37:00         2         3
1 2017-01-02 12:32:00         4         9
1 2017-01-02 12:33:00         4         9
1 2017-01-02 12:34:00         4         9
1 2017-01-02 12:35:00         4         9
1 2017-01-02 12:36:00         4         9
1 2017-01-02 12:37:00         4         9
Answered By: not_speshal