Merging of multiples time series
Question:
Today I tried to merge multiples time series, corresponding to clinical recording (Heart rate, Arterial Pressure…), to make TSfresh analysis. Some of this have the same time step, and other have different one, like this :
df1 = df1 = pd.read_csv("PATH", delimiter='t', header=None, index_col=0)
Values
Date
06/03/2021 17:22 30
06/03/2021 17:23 30
06/03/2021 17:24 30
06/03/2021 17:25 30
06/03/2021 17:26 30
... ..
df2 = pd.read_csv("PATH", delimiter='t', header=None, index_col=0)
Values
Date
06/03/2021 17:22 8,7
06/03/2021 17:23 8,6
06/03/2021 17:24 8,7
06/03/2021 17:25 8,6
06/03/2021 17:26 8,7
... ...
df3 = pd.read_csv("PATH", delimiter='t', header=None, index_col=0)
Values
Date
06/03/2021 17:00 1
06/03/2021 18:00 1
06/03/2021 19:00 1
06/03/2021 20:00 3
06/03/2021 21:00 3
The code I made to merge it generate a new dataframe like this :
new = pd.concat([df1,df2,df3], ignore_index=True, axis=1)
Values1 Values2 Values3
Date
06/03/2021 17:22 30.0 8,7 NaN
06/03/2021 17:23 30.0 8,6 NaN
06/03/2021 17:24 30.0 8,7 NaN
06/03/2021 17:25 30.0 8,6 NaN
06/03/2021 17:26 30.0 8,7 NaN
...
06/03/2021 18:00 32 9,5 1
As we can see, only one variable of the df3
was insert in the new dataframe new
, corresponding to the common time between the 3 dataframe, and the other rows was replace by NaN
.
But I think it’s will be better to replace the NaN values by the value corresponding to the hour. For example : all the row corresponding to 06/03/2021 17:** will be equal to 1 for the values3
, all the row corresponding to 06/03/2021 20:** will be 3…, like :
Values1 Values2 Values3
Date
06/03/2021 17:22 30.0 8,7 1
06/03/2021 17:23 30.0 8,6 1
06/03/2021 17:24 30.0 8,7 1
06/03/2021 17:25 30.0 8,6 1
06/03/2021 17:26 30.0 8,7 1
...
06/03/2021 18:00 32 9,5 1
06/03/2021 18:01 32 9,5 1
06/03/2021 18:02 32 9,5 1
...
06/03/2021 20:00 32 9,5 3
06/03/2021 20:01 32 9,5 3
06/03/2021 20:02 32 9,5 3
...
Thank you everybody for your help !
Answers:
After concat
group the dataframe on datetime index (rounded to hour) then forward and backward fill the values
# convert to datetime index (if not already converted)
# new.index = pd.to_datetime(new.index)
new = new.groupby(new.index.floor('H'), group_keys=False).apply(lambda s: s.ffill().bfill())
Today I tried to merge multiples time series, corresponding to clinical recording (Heart rate, Arterial Pressure…), to make TSfresh analysis. Some of this have the same time step, and other have different one, like this :
df1 = df1 = pd.read_csv("PATH", delimiter='t', header=None, index_col=0)
Values
Date
06/03/2021 17:22 30
06/03/2021 17:23 30
06/03/2021 17:24 30
06/03/2021 17:25 30
06/03/2021 17:26 30
... ..
df2 = pd.read_csv("PATH", delimiter='t', header=None, index_col=0)
Values
Date
06/03/2021 17:22 8,7
06/03/2021 17:23 8,6
06/03/2021 17:24 8,7
06/03/2021 17:25 8,6
06/03/2021 17:26 8,7
... ...
df3 = pd.read_csv("PATH", delimiter='t', header=None, index_col=0)
Values
Date
06/03/2021 17:00 1
06/03/2021 18:00 1
06/03/2021 19:00 1
06/03/2021 20:00 3
06/03/2021 21:00 3
The code I made to merge it generate a new dataframe like this :
new = pd.concat([df1,df2,df3], ignore_index=True, axis=1)
Values1 Values2 Values3
Date
06/03/2021 17:22 30.0 8,7 NaN
06/03/2021 17:23 30.0 8,6 NaN
06/03/2021 17:24 30.0 8,7 NaN
06/03/2021 17:25 30.0 8,6 NaN
06/03/2021 17:26 30.0 8,7 NaN
...
06/03/2021 18:00 32 9,5 1
As we can see, only one variable of the df3
was insert in the new dataframe new
, corresponding to the common time between the 3 dataframe, and the other rows was replace by NaN
.
But I think it’s will be better to replace the NaN values by the value corresponding to the hour. For example : all the row corresponding to 06/03/2021 17:** will be equal to 1 for the values3
, all the row corresponding to 06/03/2021 20:** will be 3…, like :
Values1 Values2 Values3
Date
06/03/2021 17:22 30.0 8,7 1
06/03/2021 17:23 30.0 8,6 1
06/03/2021 17:24 30.0 8,7 1
06/03/2021 17:25 30.0 8,6 1
06/03/2021 17:26 30.0 8,7 1
...
06/03/2021 18:00 32 9,5 1
06/03/2021 18:01 32 9,5 1
06/03/2021 18:02 32 9,5 1
...
06/03/2021 20:00 32 9,5 3
06/03/2021 20:01 32 9,5 3
06/03/2021 20:02 32 9,5 3
...
Thank you everybody for your help !
After concat
group the dataframe on datetime index (rounded to hour) then forward and backward fill the values
# convert to datetime index (if not already converted)
# new.index = pd.to_datetime(new.index)
new = new.groupby(new.index.floor('H'), group_keys=False).apply(lambda s: s.ffill().bfill())