How to count the daily number of cases with the fixed 2 month intervals?
Question:
I would like to count the daily number of cases with the fixed 2 month inverval (e.g., Jan-Feb, Mar-Apr, May-Jun, Jul-Aug, etc.). For instance,
import pandas as pd
d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
"date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
"2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})
and the result that I would like to produce is as follows:
ID date count
0 A 2010-01_02 2
1 A 2010-11_12 1
2 B 2012-01_02 2
3 C 2011-01_02 3
4 C 2014-01_02 1
5 D 2010-07_08 2
6 D 2010_09_10 1
Do you have any ideas about how to do this? Calculating the monthly number of cases is rather stratighforward, but this issue is difficult for me. Thanks in advance!
Answers:
Use Grouper
by frequency 2 months:
d1['date'] = pd.to_datetime(d1['date'])
df = (d1.groupby(['ID', pd.Grouper(freq='2m', key='date')])
.size()
.reset_index(name='count'))
m = df['date'].dt.month
df['date'] = (df['date'].dt.year.astype(str) + '-' +
m.sub(1).astype(str).str.zfill(2) + '_' +
m.astype(str).str.zfill(2))
print (df)
ID date count
0 A 2010-01_02 2
1 A 2010-11_12 1
2 B 2012-01_02 2
3 C 2011-01_02 3
4 C 2014-01_02 1
5 D 2010-07_08 2
6 D 2010-09_10 1
Because Grouper
working dynamically – use first datetime per group for specify groups for mapping by months use:
d1['date'] = pd.to_datetime(d1['date'])
N = 3 # for correct groups possible use 2,3,4,6
df1 = pd.DataFrame({'month':range(1, 13)})
df1.index = df1.index // N
df1['group'] = (df1['month'].astype(str).str.zfill(2)
.groupby(level=0)
.transform(lambda x: x.iat[0] + '_' + x.iat[-1]))
d = df1.set_index('month')['group'].to_dict()
print (d)
{1: '01_03', 2: '01_03', 3: '01_03', 4: '04_06',
5: '04_06', 6: '04_06', 7: '07_09', 8: '07_09',
9: '07_09', 10: '10_12', 11: '10_12', 12: '10_12'}
df = d1.groupby(['ID',
d1['date'].dt.strftime('%Y-').rename('Y'),
d1['date'].dt.month.map(d)]).size().reset_index(name="count")
df['date'] = df.pop('Y') + df['date']
print (df)
ID date count
0 A 2010-01_03 2
1 A 2010-10_12 1
2 B 2012-01_03 2
3 C 2011-01_03 3
4 C 2014-01_03 1
5 D 2010-07_09 3
def solve(intervals):
if not intervals:
return 0
intervals.sort(key=lambda x: (x[0], -x[1]))
end_mx = float("-inf")
ans = 0
for start, end in intervals:
if end <= end_mx:
ans += 1
end_mx = max(end_mx, end)
return ans
intervals = [[2, 6],[3, 4],[4, 7],[5, 5]]
print(solve(intervals))
I would like to count the daily number of cases with the fixed 2 month inverval (e.g., Jan-Feb, Mar-Apr, May-Jun, Jul-Aug, etc.). For instance,
import pandas as pd
d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
"date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
"2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})
and the result that I would like to produce is as follows:
ID date count
0 A 2010-01_02 2
1 A 2010-11_12 1
2 B 2012-01_02 2
3 C 2011-01_02 3
4 C 2014-01_02 1
5 D 2010-07_08 2
6 D 2010_09_10 1
Do you have any ideas about how to do this? Calculating the monthly number of cases is rather stratighforward, but this issue is difficult for me. Thanks in advance!
Use Grouper
by frequency 2 months:
d1['date'] = pd.to_datetime(d1['date'])
df = (d1.groupby(['ID', pd.Grouper(freq='2m', key='date')])
.size()
.reset_index(name='count'))
m = df['date'].dt.month
df['date'] = (df['date'].dt.year.astype(str) + '-' +
m.sub(1).astype(str).str.zfill(2) + '_' +
m.astype(str).str.zfill(2))
print (df)
ID date count
0 A 2010-01_02 2
1 A 2010-11_12 1
2 B 2012-01_02 2
3 C 2011-01_02 3
4 C 2014-01_02 1
5 D 2010-07_08 2
6 D 2010-09_10 1
Because Grouper
working dynamically – use first datetime per group for specify groups for mapping by months use:
d1['date'] = pd.to_datetime(d1['date'])
N = 3 # for correct groups possible use 2,3,4,6
df1 = pd.DataFrame({'month':range(1, 13)})
df1.index = df1.index // N
df1['group'] = (df1['month'].astype(str).str.zfill(2)
.groupby(level=0)
.transform(lambda x: x.iat[0] + '_' + x.iat[-1]))
d = df1.set_index('month')['group'].to_dict()
print (d)
{1: '01_03', 2: '01_03', 3: '01_03', 4: '04_06',
5: '04_06', 6: '04_06', 7: '07_09', 8: '07_09',
9: '07_09', 10: '10_12', 11: '10_12', 12: '10_12'}
df = d1.groupby(['ID',
d1['date'].dt.strftime('%Y-').rename('Y'),
d1['date'].dt.month.map(d)]).size().reset_index(name="count")
df['date'] = df.pop('Y') + df['date']
print (df)
ID date count
0 A 2010-01_03 2
1 A 2010-10_12 1
2 B 2012-01_03 2
3 C 2011-01_03 3
4 C 2014-01_03 1
5 D 2010-07_09 3
def solve(intervals):
if not intervals:
return 0
intervals.sort(key=lambda x: (x[0], -x[1]))
end_mx = float("-inf")
ans = 0
for start, end in intervals:
if end <= end_mx:
ans += 1
end_mx = max(end_mx, end)
return ans
intervals = [[2, 6],[3, 4],[4, 7],[5, 5]]
print(solve(intervals))