How to count the total sales by year, month

Question:

I have a big csv (17985 rows) with sales in different days.The csv looks like this:

Customer   Date        Sale
Larry      1/2/2018    20$
Mike       4/3/2020    40$
John       12/5/2017   10$
Sara       3/2/2020    90$
Charles    9/8/2022    75$

Below is how many times that exact day appears in my csv (how many sales were made that day):

occur = df.groupby(['Date']).size()
occur
2018-01-02     32
2018-01-03     31
2018-01-04     42
2018-01-05    192
2018-01-06     26

I used crosstab, groupby and several methods but the problem is that they don’t add up, or is NaN.

new_df['total_sales_that_month'] = df.groupby('Date')['Sale'].sum()
0       NaN
1       NaN
2       NaN
3       NaN
4       NaN
         ..
17980   NaN
17981   NaN
17982   NaN
17983   NaN
17984   NaN

I want to group them by year and month in a dataframe, based on total sales. Using dt.year and dt.month I managed to do this:

    year
month   
1   2020
1   2020
7   2019
8   2019
2   2018
... ...
4   2020
4   2020
4   2020
4   2020
4   2020

What I want to have is: month/year/total_sales_that_month. What method should I apply? This is the expected output:

Month   Year    Total_sale_that_month
1       2018      420$
2       2018      521$
3       2018      124$
4       2018      412$
5       2018      745$
Asked By: Vladzav

||

Answers:

You can use groupby_sum but before you have to strip ‘$’ from Sale column and convert as numeric:

# Clean your dataframe first
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['Sale'] = df['Sale'].str.strip('$').astype(float)

out = (df.groupby([df['Date'].dt.month.rename('Month'), 
                   df['Date'].dt.year.rename('Year')])
          ['Sale'].sum()
         .rename('Total_sale_that_month')
         # .astype(str).add('$')  # uncomment if '$' matters
         .reset_index())

Output:

>>> out
   Month  Year  Total_sale_that_month
0      2  2018                   20.0
1      2  2020                   90.0
2      3  2020                   40.0
3      5  2017                   10.0
4      8  2022                   75.0
Answered By: Corralien

i share you my code,
pivot_table, reset_index and sorting,

convert your col name:

df["Dt_Customer_Y"] = pd.DatetimeIndex(df['Dt_Customer']).year
df["Dt_Customer_M"] = pd.DatetimeIndex(df['Dt_Customer']).month

pvtt = df.pivot_table(index=['Dt_Customer_Y', 'Dt_Customer_M'], aggfunc={'Income':sum})
pvtt.reset_index().sort_values(['Dt_Customer_Y', 'Dt_Customer_M'])

Dt_Customer_Y   Dt_Customer_M   Income
0   2012        1              856039.0
1   2012        2               487497.0
2   2012        3               921940.0
3   2012        4               881203.0
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.