Darts Time Series Modelling with missing data

Question:

I am using darts(https://unit8co.github.io/darts/) I want to train a model to predict how many product we will sell per day based on historical information.

  1. We don’t sell any product on the weekend. so there is no data in in the training data for that.
  2. We also know that advertising spend of the days leading up to a day will have a big impact on how much we sell. So I want to use daily_advertising_spend as a covariates.

Input data looks like:

date,y,advertising_spend
2022-07-20 00:00:00,456,10
2022-07-21 00:00:00,514,10
2022-07-22 00:00:00,353,6
2022-07-25 00:00:00,511,28
2022-07-26 00:00:00,419,13
2022-07-27 00:00:00,439,16

Code:

from darts import TimeSeries
from darts.models import TFTModel

all_data = TimeSeries.from_csv("csvfile.csv", time_col="ds", freq="D")
model = TFTModel(input_chunk_length=7, output_chunk_length=1)
model.fit(series=all_data["y"], future_covariates=all_data["advertising_spend"])

However during the training it can’t compute a loss function. Please see image.

enter image description here

I looked into why this is and it is because it is treating weekends as NAN.
I set fillna_value=0 when creating the TimeSeries however and then it is able to train. However the model produced is not a good approximation when we do this.
What is the best way to handle this?

Asked By: Funzo

||

Answers:

I’m experimenting with Darts for similar modeling problems currently. What exactly do you mean by the model produced is not a good approximation?

Maybe you already did that, but generally speaking I would suggest to start with simpler models (Exponential Smoothing, ARIMA, Prophet) to get a solid baseline forecast, that more complex models like TFT would have to outperform.

My experience so far has been that most NN models don’t quite beat the simpler statistical models when it comes to univariate timeseries. Intuitively I believe the NN models need more signal in order to really leverage their strengths – such as multiple similar target series to be trained on and/or bigger sets of covariates. I read some of Prof. Rob Hyndman’s research on that topic for reference.

If you purely want to improve the TFT Model given your set of data, maybe the first way to do so would be to tune the hyperparameters – this can have quite an effect. Darts offers the gridsearch method for this, see here for documentation. Another option I saw in the Darts examples is PyTorch’s Ray Tune.

About the advertising covariate: Do you have data on (planned) advertising spend for a certain amount of days into the future, or do you only have data until the present? In the latter case, you can also consider the other NN models in Darts that only use past_covariates without losing any signal in your model.

Answered By: Internetmann

creator of Darts here. I would make a few suggestions:

  • Start with simpler models. Not TFT, but rather linear regression or ARIMA, which both support future covariates.
  • Use business day frequency ("B"), not daily.
  • Make sure you don’t have any NaN value in your time series. If you do, consider using e.g., darts.utils.missing_values.fill_missing_values().
  • If you use deep learning (later on), scale the values using Scaler.
Answered By: Julien Herzen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.