Add together elements from Pandas DataFrame based on timestamp
Question:
I am trying to add together elements in the second column from from two dataframes where the time(in the first column) is the same, however the time in each DataFrame is spaced at different intervals. So, in the image below, I would like to add the y values of both lines together:
enter image description here
So where they overlap, the combined value would be at around 3200.
Each dataframe is two columns: first one is time in unix timestamp, and the second column is power in watts, and the spacing between each row is usually 6 seconds, but sometimes more or less. Also, each dataframe starts and ends at a different time, although there is some overlap in the inner portion.
I’ve added the first few rows for ease of viewing:
df1:
time power
0 1355526770 1500
1 1355526776 1800
2 1355526782 1600
3 1355526788 1700
4 1355526794 1400
df2:
time power
0 1355526771 1250
1 1355526777 1200
2 1355526783 1280
3 1355526789 1290
4 1355526795 1300
I first though to reindex each dataframe inserting a row for every second across the time range of each df, and then linearly interpolating the power value data between each time. Then I would add together the dataframes by adding the power value where the timestamp matched exactly.
The problem with this method is that it would increase the size of each dataframe by at least 6x, and since they’re already pretty big, this would slow things down a lot.
If anyone knows another method to do this I would be very grateful.
Answers:
Using a merge_asof
to align on the nearest time:
(pd.merge_asof(df1, df2, on='time', direction='nearest', suffixes=(None, '_2'))
.assign(power=lambda d: d['power'].add(d.pop('power_2')))
)
Output:
time power
0 1355526770 2750
1 1355526776 3000
2 1355526782 2880
3 1355526788 2990
4 1355526794 2700
Beyond what the other users have said, you could also consider trying out Modin instead of pure pandas for your datasets if you want another way to speed up computation and so forth. Modin is easily integrated with your system with just one line of code. Take a look here: IntelĀ® Distribution of Modin
I am trying to add together elements in the second column from from two dataframes where the time(in the first column) is the same, however the time in each DataFrame is spaced at different intervals. So, in the image below, I would like to add the y values of both lines together:
enter image description here
So where they overlap, the combined value would be at around 3200.
Each dataframe is two columns: first one is time in unix timestamp, and the second column is power in watts, and the spacing between each row is usually 6 seconds, but sometimes more or less. Also, each dataframe starts and ends at a different time, although there is some overlap in the inner portion.
I’ve added the first few rows for ease of viewing:
df1:
time power
0 1355526770 1500
1 1355526776 1800
2 1355526782 1600
3 1355526788 1700
4 1355526794 1400
df2:
time power
0 1355526771 1250
1 1355526777 1200
2 1355526783 1280
3 1355526789 1290
4 1355526795 1300
I first though to reindex each dataframe inserting a row for every second across the time range of each df, and then linearly interpolating the power value data between each time. Then I would add together the dataframes by adding the power value where the timestamp matched exactly.
The problem with this method is that it would increase the size of each dataframe by at least 6x, and since they’re already pretty big, this would slow things down a lot.
If anyone knows another method to do this I would be very grateful.
Using a merge_asof
to align on the nearest time:
(pd.merge_asof(df1, df2, on='time', direction='nearest', suffixes=(None, '_2'))
.assign(power=lambda d: d['power'].add(d.pop('power_2')))
)
Output:
time power
0 1355526770 2750
1 1355526776 3000
2 1355526782 2880
3 1355526788 2990
4 1355526794 2700
Beyond what the other users have said, you could also consider trying out Modin instead of pure pandas for your datasets if you want another way to speed up computation and so forth. Modin is easily integrated with your system with just one line of code. Take a look here: IntelĀ® Distribution of Modin