how to show the number of records from the first day to the last day in pandas

Question:

I want to show the number of records from the first day to the last day in pandas.

I have an dataframe like this:

          day category  value
0  2022-07-01        A      1
1  2022-07-01        B      2
2  2022-07-03        A      3
3  2022-07-05        A      4
4  2022-07-07        B      5
5  2022-07-07        B      6

I want to put the value of category in column and show the number by date.
(I want to show the number of records by date from the first date to the last date.)

output may be :

          day  A  B
0  2022-07-01  1  1
1  2022-07-02  0  0
2  2022-07-03  1  0
3  2022-07-04  0  0
4  2022-07-05  1  0
5  2022-07-06  0  0
6  2022-07-07  0  2

How can I do this?

Asked By: 김진원

||

Answers:

You can do pd.crosstab then resample

#df.day = pd.to_datetime(df.day)

out = pd.crosstab(df.day,df.category).resample('1D').first().fillna(0).reset_index()
Out[607]: 
category        day    A    B
0        2022-07-01  1.0  1.0
1        2022-07-02  0.0  0.0
2        2022-07-03  1.0  0.0
3        2022-07-04  0.0  0.0
4        2022-07-05  1.0  0.0
5        2022-07-06  0.0  0.0
6        2022-07-07  0.0  2.0
Answered By: BENY

You’re looking for either pandas pivot_table() or groupby():

rands = np.random.randint(0,3,10)
choice = np.random.choice(['A','B'],10)
dates = np.random.choice([date(2022,7,1),date(2022,7,2),date(2022,7,3),date(2022,7,4)],10)

df = pd.DataFrame(data=[choice,rands]).T
df.index = dates

df
Out[3]: 
            0  1
2022-07-01  A  2
2022-07-04  B  2
2022-07-03  A  1
2022-07-02  A  0
2022-07-02  B  2
2022-07-03  B  0
2022-07-02  B  1
2022-07-04  A  2
2022-07-03  B  2
2022-07-03  B  1

   pd.pivot_table(df, index=df.index, columns=df[0],aggfunc='count', fill_value=0)
Out[6]: 
            1   
0           A  B
2022-07-01  1  0
2022-07-02  1  2
2022-07-03  1  3
2022-07-04  1  1

df.groupby([df.index,0]).count()
    Out[4]: 
                  1
               0   
    2022-07-01 A  1
    2022-07-02 A  1
               B  2
    2022-07-03 A  1
               B  3
    2022-07-04 A  1
               B  1

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

Answered By: J Beckford

This is one way to do it

df1 =df.groupby('Date')['category'].agg('value_counts').unstack(level= -1).resample('1D').first().fillna(0).reset_index()

df1

Output:

category Date    A   B
0   2022-07-01  1.0 1.0
1   2022-07-02  0.0 0.0
2   2022-07-03  1.0 0.0
3   2022-07-04  0.0 0.0
4   2022-07-05  1.0 0.0
5   2022-07-06  0.0 0.0
6   2022-07-07  0.0 2.0
Answered By: Sanju Halder
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.