Calculating total monthly cumulative number of Order
Question:
I need to find the total monthly cumulative number of order. I have 2 columns OrderDate and OrderId.I cant use a list to find the cumulative numbers since data is so large. and result should be year_month format along with cumulative order total per each months.
orderDate OrderId
2011-11-18 06:41:16 23
2011-11-18 04:41:16 2
2011-12-18 06:41:16 69
2012-03-12 07:32:15 235
2012-03-12 08:32:15 234
2012-03-12 09:32:15 235
2012-05-12 07:32:15 233
desired Result
Date CumulativeOrder
2011-11 2
2011-12 3
2012-03 6
2012-05 7
I have imported my excel into pycharm and use pandas to read excel
I have tried to split the datetime column to year and month then grouped but not getting the correct result.
df1 = df1[['OrderId','orderDate']]
df1['year'] = pd.DatetimeIndex(df1['orderDate']).year
df1['month'] = pd.DatetimeIndex(df1['orderDate']).month
df1.groupby(['year','month']).sum().groupby('year','month').cumsum()
print (df1)
Answers:
Convert column to datetimes, then to months period by to_period
, add new column by numpy.arange
and last remove duplicates with keep last dupe by column Date
and DataFrame.drop_duplicates
:
import numpy as np
df1['orderDate'] = pd.to_datetime(df1['orderDate'])
df1['Date'] = df1['orderDate'].dt.to_period('m')
#use if not sorted datetimes
#df1 = df1.sort_values('Date')
df1['CumulativeOrder'] = np.arange(1, len(df1) + 1)
print (df1)
orderDate OrderId Date CumulativeOrder
0 2011-11-18 06:41:16 23 2011-11 1
1 2011-11-18 04:41:16 2 2011-11 2
2 2011-12-18 06:41:16 69 2011-12 3
3 2012-03-12 07:32:15 235 2012-03 4
df2 = df1.drop_duplicates('Date', keep='last')[['Date','CumulativeOrder']]
print (df2)
Date CumulativeOrder
1 2011-11 2
2 2011-12 3
3 2012-03 4
Another solution:
df2 = (df1.groupby(df1['orderDate'].dt.to_period('m')).size()
.cumsum()
.rename_axis('Date')
.reset_index(name='CumulativeOrder'))
print (df2)
Date CumulativeOrder
0 2011-11 2
1 2011-12 3
2 2012-03 6
3 2012-05 7
excuse sir, what is data type in orderDate? thank you sir.
I need to find the total monthly cumulative number of order. I have 2 columns OrderDate and OrderId.I cant use a list to find the cumulative numbers since data is so large. and result should be year_month format along with cumulative order total per each months.
orderDate OrderId
2011-11-18 06:41:16 23
2011-11-18 04:41:16 2
2011-12-18 06:41:16 69
2012-03-12 07:32:15 235
2012-03-12 08:32:15 234
2012-03-12 09:32:15 235
2012-05-12 07:32:15 233
desired Result
Date CumulativeOrder
2011-11 2
2011-12 3
2012-03 6
2012-05 7
I have imported my excel into pycharm and use pandas to read excel
I have tried to split the datetime column to year and month then grouped but not getting the correct result.
df1 = df1[['OrderId','orderDate']]
df1['year'] = pd.DatetimeIndex(df1['orderDate']).year
df1['month'] = pd.DatetimeIndex(df1['orderDate']).month
df1.groupby(['year','month']).sum().groupby('year','month').cumsum()
print (df1)
Convert column to datetimes, then to months period by to_period
, add new column by numpy.arange
and last remove duplicates with keep last dupe by column Date
and DataFrame.drop_duplicates
:
import numpy as np
df1['orderDate'] = pd.to_datetime(df1['orderDate'])
df1['Date'] = df1['orderDate'].dt.to_period('m')
#use if not sorted datetimes
#df1 = df1.sort_values('Date')
df1['CumulativeOrder'] = np.arange(1, len(df1) + 1)
print (df1)
orderDate OrderId Date CumulativeOrder
0 2011-11-18 06:41:16 23 2011-11 1
1 2011-11-18 04:41:16 2 2011-11 2
2 2011-12-18 06:41:16 69 2011-12 3
3 2012-03-12 07:32:15 235 2012-03 4
df2 = df1.drop_duplicates('Date', keep='last')[['Date','CumulativeOrder']]
print (df2)
Date CumulativeOrder
1 2011-11 2
2 2011-12 3
3 2012-03 4
Another solution:
df2 = (df1.groupby(df1['orderDate'].dt.to_period('m')).size()
.cumsum()
.rename_axis('Date')
.reset_index(name='CumulativeOrder'))
print (df2)
Date CumulativeOrder
0 2011-11 2
1 2011-12 3
2 2012-03 6
3 2012-05 7
excuse sir, what is data type in orderDate? thank you sir.