Group two(2) or more categorical data by week(7days) in pandas python
Question:
This is how my data looks like:
I want to aggregate this by category, Issue and Date(weekly) to get count of record.
Date: Group, the result should be monday to sunday
Count: ADD, if two or more records have same Name and fall in a same week Date time (if falls on same interval 7 days week)
The desired output is given below:
Category
Issue
Date
Bakes
Back Brake failures
11/28/2022
Machines
Oiling of the machines
11/29/2022
Cars
windscreen broken
11/30/2022
Cars
steering wheel is shaking
11/30/2022
Bakes
The tyres are flat
12/01/2022
Machines
Normal wear and tear
12/02/2022
Machines
Normal wear and tear
12/02/2022
Cars
warning lights are on
12/03/2022
Bakes
Back Brake failures
12/04/2022
Machines
Oiling of the machines
12/05/2022
Cars
windscreen broken
12/06/2022
Bakes
excessive emissions
12/06/2022
Bakes
The tyres are flat
12/07/2022
Machines
Normal wear and tear
12/08/2022
Cars
warning lights are on
12/09/2022
Bakes
Brake pads worn
12/10/2022
Machines
Machine is consuming too much oil
12/11/2022
Cars
the tyres are wearing unevenly
12/11/2022
I want to have a result similar to this:
if possible the count should be sorted in descending order
Thanks in advance for your help.
Answers:
You can use a crosstab
with a weekly period:
pd.crosstab([df['Category'], df['Issue']],
pd.to_datetime(df['Date']).dt.to_period('W'),
margins=True, margins_name='Grand Total'
)
Output:
Date 2022-11-28/2022-12-04 2022-12-05/2022-12-11 Grand Total
Category Issue
Bakes Back Brake failures 2 0 2
Brake pads worn 0 1 1
The tyres are flat 1 1 2
excessive emissions 0 1 1
Cars steering wheel is shaking 1 0 1
the tyres are wearing unevenly 0 1 1
warning lights are on 1 1 2
windscreen broken 1 1 2
Machines Machine is consuming too much oil 0 1 1
Normal wear and tear 2 1 3
Oiling of the machines 1 1 2
Grand Total 9 9 18
This is how my data looks like:
I want to aggregate this by category, Issue and Date(weekly) to get count of record.
Date: Group, the result should be monday to sunday
Count: ADD, if two or more records have same Name and fall in a same week Date time (if falls on same interval 7 days week)
The desired output is given below:
Category | Issue | Date |
---|---|---|
Bakes | Back Brake failures | 11/28/2022 |
Machines | Oiling of the machines | 11/29/2022 |
Cars | windscreen broken | 11/30/2022 |
Cars | steering wheel is shaking | 11/30/2022 |
Bakes | The tyres are flat | 12/01/2022 |
Machines | Normal wear and tear | 12/02/2022 |
Machines | Normal wear and tear | 12/02/2022 |
Cars | warning lights are on | 12/03/2022 |
Bakes | Back Brake failures | 12/04/2022 |
Machines | Oiling of the machines | 12/05/2022 |
Cars | windscreen broken | 12/06/2022 |
Bakes | excessive emissions | 12/06/2022 |
Bakes | The tyres are flat | 12/07/2022 |
Machines | Normal wear and tear | 12/08/2022 |
Cars | warning lights are on | 12/09/2022 |
Bakes | Brake pads worn | 12/10/2022 |
Machines | Machine is consuming too much oil | 12/11/2022 |
Cars | the tyres are wearing unevenly | 12/11/2022 |
I want to have a result similar to this:
if possible the count should be sorted in descending order
Thanks in advance for your help.
You can use a crosstab
with a weekly period:
pd.crosstab([df['Category'], df['Issue']],
pd.to_datetime(df['Date']).dt.to_period('W'),
margins=True, margins_name='Grand Total'
)
Output:
Date 2022-11-28/2022-12-04 2022-12-05/2022-12-11 Grand Total
Category Issue
Bakes Back Brake failures 2 0 2
Brake pads worn 0 1 1
The tyres are flat 1 1 2
excessive emissions 0 1 1
Cars steering wheel is shaking 1 0 1
the tyres are wearing unevenly 0 1 1
warning lights are on 1 1 2
windscreen broken 1 1 2
Machines Machine is consuming too much oil 0 1 1
Normal wear and tear 2 1 3
Oiling of the machines 1 1 2
Grand Total 9 9 18