How To Create New Pandas Columns With Monthly Counts From DateTime
Question:
I have a large dataframe of crime incidents, df, with four columns. Here INCIDENT_DATE is datatype datetime. There are three possible types as well (Violent, Property, and non-index).
ID
Crime
INCIDENT_DATE
Type
XL123445
Aggrevated Assault
2018-12-29
Violent
XL123445
Simple Assault
2018-12-29
Violent
XL123445
Theft
2018-12-30
Property
TX56784
Theft
2018-04-28
Property
…
…
CA45678
Sexual Assault
1991-10-23
Violent
LA356890
Burglary
2018-12-21
Property
I want to create a new dataframe, where I can get the monthly counts (for each ID) of type property and violent, and a row for the sum total of incidents for that ID during that month.
So I would want something like:
ID
Year_Month
Violent
Property
Total
XL123445
2018-08
19654
500
20154
TX56784
2011-07
17
15
32
…
…
…
CA45678
1992-06
100
100
200
LA356890
1993-05
Property
50
50
I have created a previous dataframe with column ‘Year_Month’ before that only took into account aggregated counts of crime incidents for each ID, but this ignored ‘Type’. I did this with:
df1 = (df.value_counts(['ID', df['INCIDENT_DATE'].dt.to_period('M').rename('Year_Month')])
.rename('Count').reset_index())
Is there a way I can carry over this same logic while creating two additional columns, as desired.
Answers:
IIUC, you were very close:
df1 = df.value_counts([
'ID', df['INCIDENT_DATE'].dt.to_period('M').rename('Year_Month'), 'Type',
]).unstack('Type', fill_value=0).rename_axis(None, axis=1)
df1 = df1.assign(Total=df1.sum(axis=1)).reset_index()
On your sample data:
>>> df1
ID Year_Month Property Violent Total
0 CA45678 1991-10 0 1 1
1 LA356890 2018-12 1 0 1
2 TX56784 2018-04 1 0 1
3 XL123445 2018-12 1 2 3
I have a large dataframe of crime incidents, df, with four columns. Here INCIDENT_DATE is datatype datetime. There are three possible types as well (Violent, Property, and non-index).
ID | Crime | INCIDENT_DATE | Type |
---|---|---|---|
XL123445 | Aggrevated Assault | 2018-12-29 | Violent |
XL123445 | Simple Assault | 2018-12-29 | Violent |
XL123445 | Theft | 2018-12-30 | Property |
TX56784 | Theft | 2018-04-28 | Property |
… | … | ||
CA45678 | Sexual Assault | 1991-10-23 | Violent |
LA356890 | Burglary | 2018-12-21 | Property |
I want to create a new dataframe, where I can get the monthly counts (for each ID) of type property and violent, and a row for the sum total of incidents for that ID during that month.
So I would want something like:
ID | Year_Month | Violent | Property | Total |
---|---|---|---|---|
XL123445 | 2018-08 | 19654 | 500 | 20154 |
TX56784 | 2011-07 | 17 | 15 | 32 |
… | … | … | ||
CA45678 | 1992-06 | 100 | 100 | 200 |
LA356890 | 1993-05 | Property | 50 | 50 |
I have created a previous dataframe with column ‘Year_Month’ before that only took into account aggregated counts of crime incidents for each ID, but this ignored ‘Type’. I did this with:
df1 = (df.value_counts(['ID', df['INCIDENT_DATE'].dt.to_period('M').rename('Year_Month')])
.rename('Count').reset_index())
Is there a way I can carry over this same logic while creating two additional columns, as desired.
IIUC, you were very close:
df1 = df.value_counts([
'ID', df['INCIDENT_DATE'].dt.to_period('M').rename('Year_Month'), 'Type',
]).unstack('Type', fill_value=0).rename_axis(None, axis=1)
df1 = df1.assign(Total=df1.sum(axis=1)).reset_index()
On your sample data:
>>> df1
ID Year_Month Property Violent Total
0 CA45678 1991-10 0 1 1
1 LA356890 2018-12 1 0 1
2 TX56784 2018-04 1 0 1
3 XL123445 2018-12 1 2 3