Add a Row if a Specific ID doesn't have it for the Pre or Post period with zeros in the Missing Columns

Question

I have a dataframe that is the net amount a person has spent on services in the pre period and in the post period which was given to me. We are looking to do an analysis to compare if these members had different spend and visits in the pre period compared to the post period.

The dataframe looks like this but this problem presents itself throughout the data in several spots and sometimes it is the "Pre" period that is missing for the member and sometimes it is the "Post" period.

df=pd.DataFrame({'unique_member_id_key':[723543, 723543, 723548, 723548, 723550, 723552, 723552],'net_amount':[34.26,35.09,72.07,54.73,54.32,87.43,87.32],'total_visits':[4,2,8,1,3,5,4],'Period':["Pre","Post","Pre","Post","Pre","Pre","Post"]})

What I want to do is fix this in python such that the pandas dataframe will fill in the missing "Pre" or "Post" periods with a new row for that member that puts zeros in for the "total_visits" and "net_amount" columns and adds the "Pre" or "Post" value for the Period column (depending on if it is missing a row for "Pre" period values or "Post" period values).

Is there a way to systematically do this without having to find each ID that is missing a "Pre" or "Post" period and inserting the row individually for each time this occurs?

Thanks!!
Mark

Asked By: Mark Jalapeno

||

Source

Answer 1

IIUC, you can use pivot_table to get the dense matrix then stack to get your original dataframe:

>>> (df.pivot_table(index='unique_member_id_key', columns='Period', 
                    values=['net_amount', 'total_visits'], fill_value=0)
       .stack().reset_index())

   unique_member_id_key Period  net_amount  total_visits
0                723543   Post       35.09             2
1                723543    Pre       34.26             4
2                723548   Post       54.73             1
3                723548    Pre       72.07             8
4                723550   Post        0.00             0  # <- HERE
5                723550    Pre       54.32             3
6                723552   Post       87.32             4
7                723552    Pre       87.43             5

Or suggested by @mozway with set_index/unstack then stack/reset_index:

>>> (df.set_index(['unique_member_id_key', 'Period'])
       .unstack(fill_value=0)
       .stack().reset_index())

   unique_member_id_key Period  net_amount  total_visits
0                723543   Post       35.09             2
1                723543    Pre       34.26             4
2                723548   Post       54.73             1
3                723548    Pre       72.07             8
4                723550   Post        0.00             0  # <- HERE
5                723550    Pre       54.32             3
6                723552   Post       87.32             4
7                723552    Pre       87.43             5

Answered By: Corralien

Add a Row if a Specific ID doesn't have it for the Pre or Post period with zeros in the Missing Columns

Question:

Answers: