In Python Pandas, How do I concatenate rows of a df based on two columns? and in the order of a third one?

Question:

Python Pandas

I am trying to merge/concatenate rows together on a pandas data frame. I have a table with columns [‘Id’,’Date’,’Hour’,’Sales’].

If I groupby the df by ‘Hour’ count(), it will be like this:

‘ID’ ‘Date’ ‘Hour’
01 01/01/22 24
02 01/01/22 24
03 01/02/22 24
05 03/02/22 24

There are several Id’s, that sometimes have the same date, but each unique ID/Date has 24 unique rows of ‘Hour’. (There is 24 rows per date, and repeated dates by ID.)

I need help for some code that will take unique pair of ID’s and date, and concatenate the 24 rows of information, in order of the ‘hour’ column.

Therefore, the result would be every unique ID and Date, will have a vector, of the concatenated hour information.

The catch is that each concatenation has to be done based on two columns, ‘Id’ and ‘Date’, and in order of the ‘hour’ column.

Appreciate the help ppl.

Asked By: Eugenio5

||

Answers:

Setup:

Here is a short example and some code that moves the ‘Sales’ data into separate columns for each hour. You can change the value in the range from 3 to 24 for your case.

import pandas as pd
df = pd.DataFrame([['Dave', 1, 0, 10],['Dave', 1, 1, 20],['Dave', 1, 2, 30],
                   ['Dave', 2, 0, 40],['Dave', 2, 1, 50],['Dave', 2, 2, 60],
                   ['Carl', 1, 0, 15],['Carl', 1, 1, 25],['Carl', 1, 2, 35],
                   ['Carl', 2, 0, 45],['Carl', 2, 1, 55],['Carl', 2, 2, 65]],
                  columns=['ID', 'Date', 'Hour', 'Sales'])

Output (df):

      ID  Date  Hour  Sales
0   Dave     1     0     10
1   Dave     1     1     20
2   Dave     1     2     30
3   Dave     2     0     40
4   Dave     2     1     50
5   Dave     2     2     60
6   Carl     1     0     15
7   Carl     1     1     25
8   Carl     1     2     35
9   Carl     2     0     45
10  Carl     2     1     55
11  Carl     2     2     65

‘Where’ and ‘Merge’:

The key here is using the pandas.merge function with the on argument to choose which columns to use as an index for merging.

df.where, df.merge, and df.dropna, are very versitile pieces of Pandas that are good to learn.

new = pd.DataFrame(columns=['ID','Date'])
for hour in range(3):
    tmp = df.where(df.Hour == hour).dropna(axis=0, how='all')
    tmp[hour] = tmp['Sales']
    tmp.drop(['Hour','Sales'], axis=1, inplace=True)
    new = new.merge(tmp, how='outer', on=['ID','Date'])
new.set_index(['ID','Date'], inplace=True)

Output (new):

              0     1     2
ID   Date                  
Dave 1.0   10.0  20.0  30.0
     2.0   40.0  50.0  60.0
Carl 1.0   15.0  25.0  35.0
     2.0   45.0  55.0  65.0

Pivot Tables:

For this specific problem, you can use pivot tables to do all that work for you.

dfp = df.pivot(index=['ID','Date'], columns='Hour', values='Sales')

Output (dfp):

Hour        0   1   2
ID   Date            
Carl 1     15  25  35
     2     45  55  65
Dave 1     10  20  30
     2     40  50  60
Answered By: Alphy13