Calculating total monthly cumulative number of Order

Question:

I need to find the total monthly cumulative number of order. I have 2 columns OrderDate and OrderId.I cant use a list to find the cumulative numbers since data is so large. and result should be year_month format along with cumulative order total per each months.

orderDate                OrderId
2011-11-18 06:41:16      23
2011-11-18 04:41:16      2
2011-12-18 06:41:16      69
2012-03-12 07:32:15      235
2012-03-12 08:32:15      234
2012-03-12 09:32:15      235
2012-05-12 07:32:15      233

desired Result

Date                     CumulativeOrder
2011-11                  2
2011-12                  3
2012-03                  6
2012-05                  7

I have imported my excel into pycharm and use pandas to read excel
I have tried to split the datetime column to year and month then grouped but not getting the correct result.

df1 = df1[['OrderId','orderDate']]
df1['year']  = pd.DatetimeIndex(df1['orderDate']).year
df1['month'] = pd.DatetimeIndex(df1['orderDate']).month
df1.groupby(['year','month']).sum().groupby('year','month').cumsum()
print (df1)
Asked By: Erhan

||

Answers:

Convert column to datetimes, then to months period by to_period, add new column by numpy.arange and last remove duplicates with keep last dupe by column Date and DataFrame.drop_duplicates:

import numpy as np

df1['orderDate'] = pd.to_datetime(df1['orderDate'])
df1['Date'] = df1['orderDate'].dt.to_period('m')
#use if not sorted datetimes
#df1 = df1.sort_values('Date')
df1['CumulativeOrder'] = np.arange(1, len(df1) + 1)
print (df1)
            orderDate  OrderId    Date  CumulativeOrder
0 2011-11-18 06:41:16       23 2011-11                1
1 2011-11-18 04:41:16        2 2011-11                2
2 2011-12-18 06:41:16       69 2011-12                3
3 2012-03-12 07:32:15      235 2012-03                4

df2 = df1.drop_duplicates('Date', keep='last')[['Date','CumulativeOrder']]
print (df2)
     Date  CumulativeOrder
1 2011-11                2
2 2011-12                3
3 2012-03                4

Another solution:

df2 = (df1.groupby(df1['orderDate'].dt.to_period('m')).size()
          .cumsum()
          .rename_axis('Date')
          .reset_index(name='CumulativeOrder'))
print (df2)
     Date  CumulativeOrder
0 2011-11                2
1 2011-12                3
2 2012-03                6
3 2012-05                7
Answered By: jezrael

excuse sir, what is data type in orderDate? thank you sir.

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.