How to transform a DataFrame with a complicated series in a new DataFrame

Question:

I’m going through a hard time trying to transform a dataframe with 2 columns into another DataFrame. The first column is my index (ints) and the another column is a complicated series. For what i’m able to see the structure of the series goes like this:

A dictionary with one key and one value. The value is a list of simple dictionaries with key/value pairs.

My DataFrame looks like that:

enter image description here

Series:

{
"CashFlowDto": [
    {
        "TicketId": None,
        "Type": "Amrt",
        "Amount": 560.61,
        "PercentualAmount": 0.0494481,
        "MaturityDate": datetime.datetime(2023, 7, 10, 0, 0),
        "PaymentDate": datetime.datetime(2023, 7, 10, 0, 0),
    },
    {
        "TicketId": None,
        "Type": "Amrt",
        "Amount": 552.05,
        "PercentualAmount": 0.048693,
        "MaturityDate": datetime.datetime(2023, 8, 10, 0, 0),
        "PaymentDate": datetime.datetime(2023, 8, 10, 0, 0),
    }
]}

My desired output:

enter image description here

Could you guys help me, please?

Thanks

Asked By: pedrosjo

||

Answers:

Here’s one approach:

  • First, use Series.tolist on column CashFlowDto and use within pd.DataFrame. See this SO answer.
  • Next, repeat the result n times (i.e. n = len(df)) using pd.concat, and make sure to put parameter ignore_index to True.
  • Now, also get a repeat for df['TicketId'], for which we can use np.repeat, and keep only the values (using Series.to_numpy; alternatively, reset Series.index).
  • Finally, combine the new df and the repeats for df['TicketId'], using df.assign.
n = len(df)
res = (pd.concat([pd.DataFrame(df['CashFlowDto'].tolist())]*n,ignore_index=True)
       .assign(TicketId=np.repeat(df.TicketId, n).to_numpy()))

res

   TicketId  Type  Amount  PercentualAmount MaturityDate PaymentDate
0         1  Amrt  560.61          0.049448   2023-07-10  2023-07-10
1         1  Amrt  552.05          0.048693   2023-08-10  2023-08-10
2         2  Amrt  560.61          0.049448   2023-07-10  2023-07-10
3         2  Amrt  552.05          0.048693   2023-08-10  2023-08-10
Answered By: ouroboros1

There’s probably a more elegant way to do this by applying pd.json_normalize to your original data, but I’ll suggest a solution using list comprehension (and zip).

If your current DataFrame is named tickets_df, then you can try

cashflows_df = pd.DataFrame([{'Ticket': tId, **{
    k: v for k, v in cfd.items() if k != 'TicketId'
}} for tId, cf in zip(
    # tickets_df['TicketId'], tickets_df['CashFlows']
    tickets_df.index, tickets_df['CashFlows'] # if TicketId is the index
) for cfd in cf['CashFlowDto']])

op
(I edited the Type field just to demonstrate that the rows are separate as they should be.)

Answered By: Driftr95