how to show the number of records from the first day to the last day in pandas
Question:
I want to show the number of records from the first day to the last day in pandas.
I have an dataframe like this:
day category value
0 2022-07-01 A 1
1 2022-07-01 B 2
2 2022-07-03 A 3
3 2022-07-05 A 4
4 2022-07-07 B 5
5 2022-07-07 B 6
I want to put the value of category in column and show the number by date.
(I want to show the number of records by date from the first date to the last date.)
output may be :
day A B
0 2022-07-01 1 1
1 2022-07-02 0 0
2 2022-07-03 1 0
3 2022-07-04 0 0
4 2022-07-05 1 0
5 2022-07-06 0 0
6 2022-07-07 0 2
How can I do this?
Answers:
You can do pd.crosstab
then resample
#df.day = pd.to_datetime(df.day)
out = pd.crosstab(df.day,df.category).resample('1D').first().fillna(0).reset_index()
Out[607]:
category day A B
0 2022-07-01 1.0 1.0
1 2022-07-02 0.0 0.0
2 2022-07-03 1.0 0.0
3 2022-07-04 0.0 0.0
4 2022-07-05 1.0 0.0
5 2022-07-06 0.0 0.0
6 2022-07-07 0.0 2.0
You’re looking for either pandas pivot_table() or groupby():
rands = np.random.randint(0,3,10)
choice = np.random.choice(['A','B'],10)
dates = np.random.choice([date(2022,7,1),date(2022,7,2),date(2022,7,3),date(2022,7,4)],10)
df = pd.DataFrame(data=[choice,rands]).T
df.index = dates
df
Out[3]:
0 1
2022-07-01 A 2
2022-07-04 B 2
2022-07-03 A 1
2022-07-02 A 0
2022-07-02 B 2
2022-07-03 B 0
2022-07-02 B 1
2022-07-04 A 2
2022-07-03 B 2
2022-07-03 B 1
pd.pivot_table(df, index=df.index, columns=df[0],aggfunc='count', fill_value=0)
Out[6]:
1
0 A B
2022-07-01 1 0
2022-07-02 1 2
2022-07-03 1 3
2022-07-04 1 1
df.groupby([df.index,0]).count()
Out[4]:
1
0
2022-07-01 A 1
2022-07-02 A 1
B 2
2022-07-03 A 1
B 3
2022-07-04 A 1
B 1
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
This is one way to do it
df1 =df.groupby('Date')['category'].agg('value_counts').unstack(level= -1).resample('1D').first().fillna(0).reset_index()
df1
Output:
category Date A B
0 2022-07-01 1.0 1.0
1 2022-07-02 0.0 0.0
2 2022-07-03 1.0 0.0
3 2022-07-04 0.0 0.0
4 2022-07-05 1.0 0.0
5 2022-07-06 0.0 0.0
6 2022-07-07 0.0 2.0
I want to show the number of records from the first day to the last day in pandas.
I have an dataframe like this:
day category value
0 2022-07-01 A 1
1 2022-07-01 B 2
2 2022-07-03 A 3
3 2022-07-05 A 4
4 2022-07-07 B 5
5 2022-07-07 B 6
I want to put the value of category in column and show the number by date.
(I want to show the number of records by date from the first date to the last date.)
output may be :
day A B
0 2022-07-01 1 1
1 2022-07-02 0 0
2 2022-07-03 1 0
3 2022-07-04 0 0
4 2022-07-05 1 0
5 2022-07-06 0 0
6 2022-07-07 0 2
How can I do this?
You can do pd.crosstab
then resample
#df.day = pd.to_datetime(df.day)
out = pd.crosstab(df.day,df.category).resample('1D').first().fillna(0).reset_index()
Out[607]:
category day A B
0 2022-07-01 1.0 1.0
1 2022-07-02 0.0 0.0
2 2022-07-03 1.0 0.0
3 2022-07-04 0.0 0.0
4 2022-07-05 1.0 0.0
5 2022-07-06 0.0 0.0
6 2022-07-07 0.0 2.0
You’re looking for either pandas pivot_table() or groupby():
rands = np.random.randint(0,3,10)
choice = np.random.choice(['A','B'],10)
dates = np.random.choice([date(2022,7,1),date(2022,7,2),date(2022,7,3),date(2022,7,4)],10)
df = pd.DataFrame(data=[choice,rands]).T
df.index = dates
df
Out[3]:
0 1
2022-07-01 A 2
2022-07-04 B 2
2022-07-03 A 1
2022-07-02 A 0
2022-07-02 B 2
2022-07-03 B 0
2022-07-02 B 1
2022-07-04 A 2
2022-07-03 B 2
2022-07-03 B 1
pd.pivot_table(df, index=df.index, columns=df[0],aggfunc='count', fill_value=0)
Out[6]:
1
0 A B
2022-07-01 1 0
2022-07-02 1 2
2022-07-03 1 3
2022-07-04 1 1
df.groupby([df.index,0]).count()
Out[4]:
1
0
2022-07-01 A 1
2022-07-02 A 1
B 2
2022-07-03 A 1
B 3
2022-07-04 A 1
B 1
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
This is one way to do it
df1 =df.groupby('Date')['category'].agg('value_counts').unstack(level= -1).resample('1D').first().fillna(0).reset_index()
df1
Output:
category Date A B
0 2022-07-01 1.0 1.0
1 2022-07-02 0.0 0.0
2 2022-07-03 1.0 0.0
3 2022-07-04 0.0 0.0
4 2022-07-05 1.0 0.0
5 2022-07-06 0.0 0.0
6 2022-07-07 0.0 2.0