Electricity Price Forecast Naive Benchmark
Question:
I have been struggling with making a naive forecast in Python in accordance with the Standard Naive Forecast used in many EPF studies as a benchmark:
So the data is below, which is hourly data from 4 different price regions. Note that ‘date’ initially have it’s own column.
date Price_REG1 Price_REG2 Price_REG3 Price_REG4
0 2020-01-01 00:00:00 30.83 30.83 30.83 30.83
1 2020-01-01 01:00:00 28.78 28.78 28.78 28.78
2 2020-01-01 02:00:00 28.45 28.45 28.45 28.45
3 2020-01-01 03:00:00 27.90 27.90 27.90 27.90
4 2020-01-01 04:00:00 27.52 27.52 27.52 27.52
The goal is to apply the formula above to this series to be able to get the benchmark forecast.
Answers:
So first, a Day of Week column is needed. To add this, and also Hour of Week and Hour of Day, I ran this:
df['Hour of Week'] = ((df['date'].dt.dayofweek) * 24 + 24) - (24 - df['date'].dt.hour) + 1
df['Day of Week'] = df['date'].dt.dayofweek +1
df['Hour of Day'] = df['date'].dt.hour +1
Then, to shift the variables, I ran the following code:
# Create separate df for day 1, 6, 7
dow_1 = df.loc[df['Day of Week'] ==1]
dow_6 = df.loc[df['Day of Week'] ==6]
dow_7 = df.loc[df['Day of Week'] ==7]
# Combine the three df's
dow_167 = pd.concat([dow_1, dow_6, dow_7], axis=0)
# Sort them by date
dow_167 = dow_167.sort_values(by='date')
# Make date the index
dow_167.set_index('date', inplace = True)
# Shift the observations 72 steps, which will move them to the same
# day one week earlier
dow_167 = dow_167.shift(72)
# Make a copy of df
dow_2345 = df.copy()
# Make date to index
dow_2345.set_index('date', inplace = True)
# Shift all observations 24 steps so hour x from today will be at the
# spot for hour x tomorrow
dow_2345 = dow_2345.shift(24)
# Reset index
dow_2345 = dow_2345.reset_index()
# Create columns which indicate Hour and Day of Week and Hour
# of day
dow_2345['Hour of Week'] = ((dow_2345['date'].dt.dayofweek) * 24 + 24) - (24 - dow_2345['date'].dt.hour) + 1
dow_2345['Day of Week'] = dow_2345['date'].dt.dayofweek +1
dow_2345['Hour of Day'] = dow_2345['date'].dt.hour +1
# Make 4 separate df's for day 2, 3, 4, 5.
dow_2 = dow_2345.loc[dow_2345['Day of Week'] ==2]
dow_3 = dow_2345.loc[dow_2345['Day of Week'] ==3]
dow_4 = dow_2345.loc[dow_2345['Day of Week'] ==4]
dow_5 = dow_2345.loc[dow_2345['Day of Week'] ==5]
# Combine the df's
dow_2345 = pd.concat([dow_2, dow_3, dow_4, dow_5], axis=0)
# Sort by date
dow_2345 = dow_2345.sort_values(by='date')
# Make date index
dow_2345.set_index('date', inplace = True)
# Combine day 1, 6, 7 with day 2, 3, 4, 5.
forecast_standard = pd.concat([dow_167, dow_2345], axis=0)
# Sort by date
forecast_standard = forecast_standard.sort_values(by='date')
# Clean up
del dow_1
del dow_6
del dow_7
del dow_167
del dow_2
del dow_3
del dow_4
del dow_5
del dow_2345
del forecast_standard['Hour of Week']
del forecast_standard['Day of Week']
del forecast_standard['Hour of Day']
Where the first part creates separate df’s for Monday, Saturday and Sunday combines these, and then shifts them by one week, and then the second part does the same for the remaining days. The final line combines the two parts above.
This worked well for me and I wanted to share my way of solving the problem. There may be a more efficient way to achieve the same result though.
I have been struggling with making a naive forecast in Python in accordance with the Standard Naive Forecast used in many EPF studies as a benchmark:
So the data is below, which is hourly data from 4 different price regions. Note that ‘date’ initially have it’s own column.
date Price_REG1 Price_REG2 Price_REG3 Price_REG4
0 2020-01-01 00:00:00 30.83 30.83 30.83 30.83
1 2020-01-01 01:00:00 28.78 28.78 28.78 28.78
2 2020-01-01 02:00:00 28.45 28.45 28.45 28.45
3 2020-01-01 03:00:00 27.90 27.90 27.90 27.90
4 2020-01-01 04:00:00 27.52 27.52 27.52 27.52
The goal is to apply the formula above to this series to be able to get the benchmark forecast.
So first, a Day of Week column is needed. To add this, and also Hour of Week and Hour of Day, I ran this:
df['Hour of Week'] = ((df['date'].dt.dayofweek) * 24 + 24) - (24 - df['date'].dt.hour) + 1
df['Day of Week'] = df['date'].dt.dayofweek +1
df['Hour of Day'] = df['date'].dt.hour +1
Then, to shift the variables, I ran the following code:
# Create separate df for day 1, 6, 7
dow_1 = df.loc[df['Day of Week'] ==1]
dow_6 = df.loc[df['Day of Week'] ==6]
dow_7 = df.loc[df['Day of Week'] ==7]
# Combine the three df's
dow_167 = pd.concat([dow_1, dow_6, dow_7], axis=0)
# Sort them by date
dow_167 = dow_167.sort_values(by='date')
# Make date the index
dow_167.set_index('date', inplace = True)
# Shift the observations 72 steps, which will move them to the same
# day one week earlier
dow_167 = dow_167.shift(72)
# Make a copy of df
dow_2345 = df.copy()
# Make date to index
dow_2345.set_index('date', inplace = True)
# Shift all observations 24 steps so hour x from today will be at the
# spot for hour x tomorrow
dow_2345 = dow_2345.shift(24)
# Reset index
dow_2345 = dow_2345.reset_index()
# Create columns which indicate Hour and Day of Week and Hour
# of day
dow_2345['Hour of Week'] = ((dow_2345['date'].dt.dayofweek) * 24 + 24) - (24 - dow_2345['date'].dt.hour) + 1
dow_2345['Day of Week'] = dow_2345['date'].dt.dayofweek +1
dow_2345['Hour of Day'] = dow_2345['date'].dt.hour +1
# Make 4 separate df's for day 2, 3, 4, 5.
dow_2 = dow_2345.loc[dow_2345['Day of Week'] ==2]
dow_3 = dow_2345.loc[dow_2345['Day of Week'] ==3]
dow_4 = dow_2345.loc[dow_2345['Day of Week'] ==4]
dow_5 = dow_2345.loc[dow_2345['Day of Week'] ==5]
# Combine the df's
dow_2345 = pd.concat([dow_2, dow_3, dow_4, dow_5], axis=0)
# Sort by date
dow_2345 = dow_2345.sort_values(by='date')
# Make date index
dow_2345.set_index('date', inplace = True)
# Combine day 1, 6, 7 with day 2, 3, 4, 5.
forecast_standard = pd.concat([dow_167, dow_2345], axis=0)
# Sort by date
forecast_standard = forecast_standard.sort_values(by='date')
# Clean up
del dow_1
del dow_6
del dow_7
del dow_167
del dow_2
del dow_3
del dow_4
del dow_5
del dow_2345
del forecast_standard['Hour of Week']
del forecast_standard['Day of Week']
del forecast_standard['Hour of Day']
Where the first part creates separate df’s for Monday, Saturday and Sunday combines these, and then shifts them by one week, and then the second part does the same for the remaining days. The final line combines the two parts above.
This worked well for me and I wanted to share my way of solving the problem. There may be a more efficient way to achieve the same result though.