How to count the daily number of cases with the fixed 2 month intervals?

Question:

I would like to count the daily number of cases with the fixed 2 month inverval (e.g., Jan-Feb, Mar-Apr, May-Jun, Jul-Aug, etc.). For instance,

import pandas as pd

d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
                   "date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
                            "2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})

and the result that I would like to produce is as follows:

  ID        date  count
0  A  2010-01_02      2
1  A  2010-11_12      1
2  B  2012-01_02      2
3  C  2011-01_02      3
4  C  2014-01_02      1
5  D  2010-07_08      2
6  D  2010_09_10      1

Do you have any ideas about how to do this? Calculating the monthly number of cases is rather stratighforward, but this issue is difficult for me. Thanks in advance!

Asked By: krcoder

||

Answers:

Use Grouper by frequency 2 months:

d1['date'] = pd.to_datetime(d1['date'])

df = (d1.groupby(['ID', pd.Grouper(freq='2m', key='date')])
        .size()
        .reset_index(name='count'))

m = df['date'].dt.month
df['date'] = (df['date'].dt.year.astype(str) + '-' +
               m.sub(1).astype(str).str.zfill(2) + '_' + 
               m.astype(str).str.zfill(2))
print (df)
  ID        date  count
0  A  2010-01_02      2
1  A  2010-11_12      1
2  B  2012-01_02      2
3  C  2011-01_02      3
4  C  2014-01_02      1
5  D  2010-07_08      2
6  D  2010-09_10      1

Because Grouper working dynamically – use first datetime per group for specify groups for mapping by months use:

d1['date'] = pd.to_datetime(d1['date'])

N = 3 # for correct groups possible use 2,3,4,6
df1 = pd.DataFrame({'month':range(1, 13)})
df1.index = df1.index // N

df1['group'] = (df1['month'].astype(str).str.zfill(2)
                  .groupby(level=0)
                  .transform(lambda x: x.iat[0] + '_' + x.iat[-1]))
d = df1.set_index('month')['group'].to_dict()
print (d)
{1: '01_03', 2: '01_03', 3: '01_03', 4: '04_06',
 5: '04_06', 6: '04_06', 7: '07_09', 8: '07_09',
 9: '07_09', 10: '10_12', 11: '10_12', 12: '10_12'}

df = d1.groupby(['ID', 
                 d1['date'].dt.strftime('%Y-').rename('Y'), 
                 d1['date'].dt.month.map(d)]).size().reset_index(name="count")

df['date'] = df.pop('Y') + df['date']

print (df)
  ID        date  count
0  A  2010-01_03      2
1  A  2010-10_12      1
2  B  2012-01_03      2
3  C  2011-01_03      3
4  C  2014-01_03      1
5  D  2010-07_09      3
Answered By: jezrael
def solve(intervals):
   if not intervals:
      return 0
   intervals.sort(key=lambda x: (x[0], -x[1]))
   end_mx = float("-inf")
   ans = 0
   for start, end in intervals:
      if end <= end_mx:
         ans += 1
      end_mx = max(end_mx, end)
   return ans

intervals = [[2, 6],[3, 4],[4, 7],[5, 5]]
print(solve(intervals))
Answered By: Kuldeep Singh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.