Python Pandas – How can I expand month-over-month goals for markets and channels into a day-over-day one?

Question

For a long time, I’ve maintained a report that shows progress to goals for various markets and channels, and I’ve relied on some functions in Google Sheets (most notably split and flatten) to take a monthly budget and split it out into a daily goal so it can be combined with data from another system to get daily counts, and then aggregate it in Tableau Desktop to whatever time period is needed (i.e. by week, month, year). It’s finicky to add new markets, channels, etc., and the Google Sheets have gotten too big to connect with Tableau anyway. I wanted to use Python to make things easier.

The solution I’ve been working on uses the Pandas library of Python to pull in an Excel file that has a market, channel, and KPI in each row, and has a column for each month, with the actual goal as the values. I can get it to unpivot into a more tabular view with pd.melt, but I haven’t found any solutions that allow me to expand each month into days, where each day has a fraction of the goal proportionate to the number of days in the month, while preserving the KPI, market, and channel.

df = pd.DataFrame([['New', 'Albuquerque', 'Marketing', 34, 34, 34, 35, 35, 36, 36, 36, 37, 40, 40, 40],
                   ['New', 'Boston', 'Marketing', 12, 12, 12, 12, 12, 13, 13, 14, 14, 15, 16, 17],
                   ['Converted', 'Albuquerque', 'Marketing', 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
                   ['Converted', 'Boston', 'Marketing', 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],
                  columns=['KPI',
                           'Market',
                           'Channel',
                           '2022-01-01',
                           '2022-02-01',
                           '2022-03-01',
                           '2022-04-01',
                           '2022-05-01',
                           '2022-06-01',
                           '2022-07-01',
                           '2022-08-01',
                           '2022-09-01',
                           '2022-10-01',
                           '2022-11-01',
                           '2022-12-01'])

# Set up variables for the melt
index_vars = ['KPI','Market','Channel']
val_vars = df.set_index(index_vars).columns.tolist()

# Unpivot months
df = pd.melt(df,
             id_vars=index_vars,
             value_vars=val_vars,
             var_name='Date',
             value_name='Goal',
             ignore_index=False)

# Force dates to datetime, sort and reset index for a clean view
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df = df.sort_values(by=['KPI','Market','Channel','Date']).reset_index(drop=True)

print(df)

This gives me a view like this:

          KPI       Market    Channel       Date  Goal
0   Converted  Albuquerque  Marketing 2022-01-01     5
1   Converted  Albuquerque  Marketing 2022-02-01     5
2   Converted  Albuquerque  Marketing 2022-03-01     5
3   Converted  Albuquerque  Marketing 2022-04-01     5
4   Converted  Albuquerque  Marketing 2022-05-01     5
5   Converted  Albuquerque  Marketing 2022-06-01     5
6   Converted  Albuquerque  Marketing 2022-07-01     5
7   Converted  Albuquerque  Marketing 2022-08-01     5
8   Converted  Albuquerque  Marketing 2022-09-01     5
9   Converted  Albuquerque  Marketing 2022-10-01     5
10  Converted  Albuquerque  Marketing 2022-11-01     5
11  Converted  Albuquerque  Marketing 2022-12-01     5
12  Converted       Boston  Marketing 2022-01-01     2
13  Converted       Boston  Marketing 2022-02-01     2
...

I’m trying to take this and have it spit out something like this:

          KPI       Market    Channel       Date  Goal
0   Converted  Albuquerque  Marketing 2022-01-01     0.1612903226
1   Converted  Albuquerque  Marketing 2022-01-02     0.1612903226
2   Converted  Albuquerque  Marketing 2022-01-03     0.1612903226
3   Converted  Albuquerque  Marketing 2022-01-04     0.1612903226
4   Converted  Albuquerque  Marketing 2022-01-05     0.1612903226
5   Converted  Albuquerque  Marketing 2022-01-06     0.1612903226
6   Converted  Albuquerque  Marketing 2022-01-07     0.1612903226
7   Converted  Albuquerque  Marketing 2022-01-08     0.1612903226
8   Converted  Albuquerque  Marketing 2022-01-09     0.1612903226
9   Converted  Albuquerque  Marketing 2022-01-10     0.1612903226
10  Converted  Albuquerque  Marketing 2022-01-11     0.1612903226
11  Converted  Albuquerque  Marketing 2022-01-12     0.1612903226
12  Converted       Boston  Marketing 2022-01-13     0.064516129
13  Converted       Boston  Marketing 2022-01-14     0.064516129
...

Edit: To expand on where I’m stuck, I’ve seen other solutions to get the quotient by dividing the goal by pd.Period.days_in_month, so I don’t think that will end up being a problem. The problem I’m facing is that the solutions I’ve seen for expanding the months into their constituent days have only shown the solution applied to a DataFrame with a single column of datetime data as the index with non-repeating values, whereas the DataFrame I’m looking to build would have the dates repeated for each KPI/Market/Channel combination.

When I try solutions like this one:

start = '2022-01-01'
end = '2022-12-31'
dates = pd.date_range(start,end,freq='D')
df_daily = df.reindex(dates,method='ffill')
df_daily

I get a TypeError that it "Cannot compare dtypes int64 and datetime64[ns]"

When I try to convert the Date column with .dt.to_period(‘m’).to_timestamp() right after melting, i.e.:

df['Date'] = (pd.to_datetime(df['Date'], format='%Y/%m/%d')
                .dt.to_period('m')
                .dt.to_timestamp())

I get an error that it "Cannot compare dtypes int64 and datetime64[ns]"

I’m not sure what the error is in my approach, but I feel like I’m missing something glaringly obvious.

Asked By: a guy i kno

||

Source

Answer 1

It can be easier to start from the original dataframe before melt:

# Keep only date columns and set others to index
out = df.set_index(index_vars)
out.columns = pd.to_datetime(out.columns)

# Expand months to days
new_idx = pd.date_range(out.columns.min(), out.columns.max() + pd.offsets.MonthEnd(0), freq='D')

# Compute the new goal according days in month
out /= out.columns.days_in_month

# Reindex with the days index and fill missing values
out = out.reindex(new_idx, axis=1).ffill(axis=1)

The intermediate output is:

>>> out
                                 2022-01-01  2022-01-02  2022-01-03  ...  2022-12-29  2022-12-30  2022-12-31
KPI       Market      Channel                                        ...                                    
New       Albuquerque Marketing    1.096774    1.096774    1.096774  ...    1.290323    1.290323    1.290323
          Boston      Marketing    0.387097    0.387097    0.387097  ...    0.548387    0.548387    0.548387
Converted Albuquerque Marketing    0.161290    0.161290    0.161290  ...    0.161290    0.161290    0.161290
          Boston      Marketing    0.064516    0.064516    0.064516  ...    0.064516    0.064516    0.064516

[4 rows x 365 columns]

However if you want the expected output, you can use:

out = out.rename_axis(columns='Date').stack().to_frame('Goal').reset_index()

Final output:

>>> out
            KPI       Market    Channel       Date      Goal
0           New  Albuquerque  Marketing 2022-01-01  1.096774
1           New  Albuquerque  Marketing 2022-01-02  1.096774
2           New  Albuquerque  Marketing 2022-01-03  1.096774
3           New  Albuquerque  Marketing 2022-01-04  1.096774
4           New  Albuquerque  Marketing 2022-01-05  1.096774
...         ...          ...        ...        ...       ...
1455  Converted       Boston  Marketing 2022-12-27  0.064516
1456  Converted       Boston  Marketing 2022-12-28  0.064516
1457  Converted       Boston  Marketing 2022-12-29  0.064516
1458  Converted       Boston  Marketing 2022-12-30  0.064516
1459  Converted       Boston  Marketing 2022-12-31  0.064516

[1460 rows x 5 columns]

Answered By: Corralien

Python Pandas – How can I expand month-over-month goals for markets and channels into a day-over-day one?

Question:

Answers: