How to create windows based on time when i have irregular sample rate?

Question

Μy dataset consists of timeseries which are measurements from sensors (accelerometer, gyroscope, magnetometer). I need to create windows in order to extract features and create feature vectors. The problem is that the sample rate is irregular.
For instance, sensors may stop recording for 1 minute and then continue again. Example of my dataset:

**Timestamp                x         y         z**

2022-12-25 08:55:31  0.462288 -0.747311 -0.049593
2022-12-25 08:55:31  0.792116 -1.437709  0.702323
2022-12-25 08:55:31  0.880261 -0.185562  1.129537
2022-12-25 08:55:32 -0.084058  0.441366  0.955718
2022-12-25 08:55:32 -0.107756  0.319304  1.090497
2022-12-25 08:55:32 -0.091866  0.373503  1.034103
2022-12-25 08:56:59  0.341448  0.085186  1.297256
2022-12-25 08:56:59  0.426420  0.233355  1.137589
2022-12-25 08:57:00  1.150247 -0.665053  0.202337

Until now i have created 2 seconds windows based on timestamp. The problem is that my code does not recognize the gap between 32 and 59 second.
What i need is to split the dataframe at that point and keep creating windows starting from 59 second.
Here is my code:

def create_windows(df):
  grouped = df.groupby('Seconds')
  dfs = [grouped.get_group(x) for x in grouped.groups]
  ls = []
  for i in range(len(dfs)-1):
    a = pd.concat([dfs[i], dfs[i+1]], axis=0)
    ls.append(a)

My results are:

**Seconds                   x       y         z**       

2022-12-25 08:55:24  1.000126 -1.102270  0.227957
2022-12-25 08:55:24  0.872452 -0.747067 -0.476837
2022-12-25 08:55:24  0.734745 -0.864248 -0.090860
2022-12-25 08:55:24  1.083604 -1.301008  0.451095
2022-12-25 08:55:25  0.459849 -1.184315  0.344436
2022-12-25 08:55:25 -0.028884 -0.918935  0.478209
2022-12-25 08:55:25  0.355386 -0.998021 -0.362340
        
                    
**Seconds                 x         y          z ** 
                        
2022-12-25 08:55:25  0.938607 -0.928207  0.069052
2022-12-25 08:55:25  1.156865 -0.720959  0.349072
2022-12-25 08:55:25  0.931287 -1.360330  0.592462
2022-12-25 08:55:25  0.362462 -0.977769  0.517280
2022-12-25 08:55:26  1.638277 -1.305400  0.142283
2022-12-25 08:55:26  0.679326 -0.734867 -0.002257
2022-12-25 08:55:26  0.738405 -0.601064 -0.321806

What i try to fix:

**Seconds                   x       y         z**       

2022-12-25 08:55:32 -0.107756  0.319304  1.090497
2022-12-25 08:55:32 -0.091866  0.373503  1.034103
2022-12-25 08:56:59  0.341448  0.085186  1.297256
2022-12-25 08:56:59  0.426420  0.233355  1.137589

Asked By: Gvasiles

||

Source

Answer 1

With the dataframe you provided:

import pandas as pd

df = pd.DataFrame(
    {
        "Timestamp": [
            "2022-12-25 08:55:31",
            "2022-12-25 08:55:31",
            "2022-12-25 08:55:31",
            "2022-12-25 08:55:32",
            "2022-12-25 08:55:32",
            "2022-12-25 08:55:32",
            "2022-12-25 08:56:59",
            "2022-12-25 08:56:59",
            "2022-12-25 08:57:00",
        ],
        "x": [
            0.462288,
            0.792116,
            0.880261,
            -0.084058,
            -0.107756,
            -0.091866,
            0.341448,
            0.42642,
            1.150247,
        ],
        "y": [
            -0.747311,
            -1.437709,
            -0.185562,
            0.441366,
            0.319304,
            0.373503,
            0.085186,
            0.233355,
            -0.665053,
        ],
        "z": [
            -0.049593,
            0.702323,
            1.129537,
            0.955718,
            1.090497,
            1.034103,
            1.297256,
            1.137589,
            0.202337,
        ],
    }
)

Here is one way to do it with Pandas Timedelta and unique:

df["Timestamp"] = pd.to_datetime(df["Timestamp"], infer_datetime_format=True)

dfs = [
    df.loc[
        (df["Timestamp"] >= v)
        & (df["Timestamp"] <= v + pd.Timedelta(value=1, unit="second")),
        :,
    ]
    for v in df["Timestamp"].unique()
]

Then:

for df_ in dfs:
    print(df_)
# Output

            Timestamp         x         y         z
0 2022-12-25 08:55:31  0.462288 -0.747311 -0.049593
1 2022-12-25 08:55:31  0.792116 -1.437709  0.702323
2 2022-12-25 08:55:31  0.880261 -0.185562  1.129537
3 2022-12-25 08:55:32 -0.084058  0.441366  0.955718
4 2022-12-25 08:55:32 -0.107756  0.319304  1.090497
5 2022-12-25 08:55:32 -0.091866  0.373503  1.034103
            Timestamp         x         y         z
3 2022-12-25 08:55:32 -0.084058  0.441366  0.955718
4 2022-12-25 08:55:32 -0.107756  0.319304  1.090497
5 2022-12-25 08:55:32 -0.091866  0.373503  1.034103
            Timestamp         x         y         z
6 2022-12-25 08:56:59  0.341448  0.085186  1.297256
7 2022-12-25 08:56:59  0.426420  0.233355  1.137589
8 2022-12-25 08:57:00  1.150247 -0.665053  0.202337
            Timestamp         x         y         z
8 2022-12-25 08:57:00  1.150247 -0.665053  0.202337

Answered By: Laurent

Answer 2

In order for this to work I set a new index counting from 0 to the df.shape[0] and used Timestamp as the fourth column of my dataframe

def windows(df):
    time_diff = df['Timestamp'].diff()
    mask = time_diff > pd.Timedelta(seconds=1)
    ind = mask.index[mask]
    df1 = df.iloc[:ind[0],:]
    df2 = df.iloc[ind[0]:,:]

Answered By: Gvasiles

Answer 3

My previous answer actually splits the initial dataframes in two different dataframes at the point of "time gap".

Using @Laurent function I added few more lines to do anything i asked.
@laurent ‘s function initially created windows that contain also only one second. Here i delete those dataframes

def windows(sensor):

  dfs = [sensor.loc[(sensor["Timestamp"] >= v)
          & (sensor["Timestamp"] <= v + pd.Timedelta(value=1, unit="second")),
          :,
      ]
      for v in sensor["Timestamp"].unique()
  ]
 
  dfs = [df for df in dfs if not df['Timestamp'].iloc[-1] == df['Timestamp'].iloc[0]]

Answered By: Gvasiles

How to create windows based on time when i have irregular sample rate?

Question:

Answers: