Efficient way to restructure pandas dataframe from row to resampled column structure

Question:

I have a pandas dataframe structured as follows:

TimeStamp                                                                           
2022-12-30 10:31:58.483700+00:00       1        FixType            4    4.000000e+00
2022-12-30 10:31:58.483700+00:00       1     Satellites           11    1.100000e+01
2022-12-30 10:31:58.484150+00:00       2  TimeConfirmed            0    0.000000e+00
2022-12-30 10:31:58.484150+00:00       2          Epoch  63797521999    1.641638e+09
2022-12-30 10:31:58.484150+00:00       2      TimeValid            1    1.000000e+00
...                                  ...            ...          ...             ...
2022-12-30 10:54:32.714050+00:00       9   AngularRateZ         1020   -1.000000e+00
2022-12-30 10:54:32.714050+00:00       9  AccelerationY          513    1.250000e-01
2022-12-30 10:54:32.714050+00:00       9  AccelerationZ          594    1.025000e+01
2022-12-30 10:54:32.714050+00:00       9   AngularRateX         1025    2.500000e-01
2022-12-30 10:54:32.714050+00:00       9       ImuValid            1    1.000000e+00

[973528 rows x 4 columns]

I need to get it into the following structure, while also resampling it to a specific frequency (e.g. 1S):

                           FixType  Satellites  ...  AngularRateZ  ImuValid
TimeStamp                                       ...                        
2022-12-30 10:31:59+00:00      4.0        11.0  ...           NaN       NaN
2022-12-30 10:32:00+00:00      4.0        11.0  ...         -1.00       1.0
2022-12-30 10:32:01+00:00      4.0        12.0  ...         -1.00       1.0
2022-12-30 10:32:02+00:00      4.0        12.0  ...         -1.00       1.0
2022-12-30 10:32:03+00:00      4.0        12.0  ...         -1.00       1.0
...                            ...         ...  ...           ...       ...
2022-12-30 10:54:28+00:00      4.0        13.0  ...         -1.00       1.0
2022-12-30 10:54:29+00:00      4.0        14.0  ...         -1.00       1.0
2022-12-30 10:54:30+00:00      4.0        14.0  ...         -0.75       1.0
2022-12-30 10:54:31+00:00      4.0        14.0  ...         -1.00       1.0
2022-12-30 10:54:32+00:00      4.0        14.0  ...         -1.00       1.0

[1354 rows x 39 columns]

Currently I achieve this via below code:

def restructure_data(df_phys, res):
    import pandas as pd

    df_phys_join = pd.DataFrame({"TimeStamp": []})
    if not df_phys.empty:
        for message, df_phys_message in df_phys.groupby("CAN ID"):
            for signal, data in df_phys_message.groupby("Signal"):

                col_name = signal

                df_phys_join = pd.merge_ordered(
                    df_phys_join,
                    data["Physical Value"].rename(col_name).resample(res).ffill().dropna(),
                    on="TimeStamp",
                    fill_method="none",
                ).set_index("TimeStamp")

    return df_phys_join

This works, but it seems inefficient. I wonder if there is a smarter and perhaps more pythonic way to achieve a similar result?

Asked By: mfcss

||

Answers:

It’s a little bit hard to tell whether this would work without more data/info (Are there any duplicate time/new column combinations that need to be handled?), but you might be able to get away with a one liner:

df.pivot_table(values="col4", index=pd.Grouper(freq="S", key="TimeStamp"), columns="col3")

filling in the correct column names in your original df for values= and columns=.

Answered By: Chrysophylaxs
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.