How To Create New Pandas Columns With Monthly Counts From DateTime

Question:

I have a large dataframe of crime incidents, df, with four columns. Here INCIDENT_DATE is datatype datetime. There are three possible types as well (Violent, Property, and non-index).

ID Crime INCIDENT_DATE Type
XL123445 Aggrevated Assault 2018-12-29 Violent
XL123445 Simple Assault 2018-12-29 Violent
XL123445 Theft 2018-12-30 Property
TX56784 Theft 2018-04-28 Property
CA45678 Sexual Assault 1991-10-23 Violent
LA356890 Burglary 2018-12-21 Property

I want to create a new dataframe, where I can get the monthly counts (for each ID) of type property and violent, and a row for the sum total of incidents for that ID during that month.

So I would want something like:

ID Year_Month Violent Property Total
XL123445 2018-08 19654 500 20154
TX56784 2011-07 17 15 32
CA45678 1992-06 100 100 200
LA356890 1993-05 Property 50 50

I have created a previous dataframe with column ‘Year_Month’ before that only took into account aggregated counts of crime incidents for each ID, but this ignored ‘Type’. I did this with:

df1 = (df.value_counts(['ID', df['INCIDENT_DATE'].dt.to_period('M').rename('Year_Month')])
     .rename('Count').reset_index())

Is there a way I can carry over this same logic while creating two additional columns, as desired.

Asked By: TheMaffGuy

||

Answers:

IIUC, you were very close:

df1 = df.value_counts([
    'ID', df['INCIDENT_DATE'].dt.to_period('M').rename('Year_Month'), 'Type',
]).unstack('Type', fill_value=0).rename_axis(None, axis=1)
df1 = df1.assign(Total=df1.sum(axis=1)).reset_index()

On your sample data:

>>> df1
         ID Year_Month  Property  Violent  Total
0   CA45678    1991-10         0        1      1
1  LA356890    2018-12         1        0      1
2   TX56784    2018-04         1        0      1
3  XL123445    2018-12         1        2      3
Answered By: Pierre D