Multiple-series training input is giving NaN loss while same data but One-serie training input is not

Question

I want to train a N-Beats time series model using Darts. I have a time serie DataFrame for each users so I want to use Multiple-Series training but when I feed the list of TimeSeries I directly get NaN as losses during training. If I concatenate all users’s TimeSeries into one, I get a normal loss. In both cases the data is scale, fill and cast to float.32

data = scaler.transform(filler.transform(data)).astype(np.float32)

Here is the code that I use combine the list of TimeSeries into a single TimeSeries. I also have a pure Darts code for that but it is much slower for the same result.

SPLIT = 0.8

if concatenate_to_one_ts:
    all_dfs = []
    all_dfs_cov = []

    for i in range(len(list_of_target_ts)):
        all_dfs.append(list_of_target_ts[i].pd_series())
        all_dfs_cov.append(list_of_cov_ts[i].pd_dataframe())
        
    all_dfs = pd.concat(all_dfs)
    all_dfs_cov = pd.concat(all_dfs_cov)
    
    nbr_train_sample = int(len(all_dfs) * SPLIT)

    all_dfs_train = all_dfs[:nbr_train_sample]
    all_dfs_test = all_dfs[nbr_train_sample:]
    
    list_of_target_ts_train = TimeSeries.from_series(all_dfs_train.reset_index(drop=True))
    list_of_target_ts_test = TimeSeries.from_series(all_dfs_test.reset_index(drop=True))
    
    all_dfs_cov_train = all_dfs_cov[:nbr_train_sample]
    all_dfs_cov_test = all_dfs_cov[nbr_train_sample:]
    
    list_of_cov_ts_train = TimeSeries.from_dataframe(all_dfs_cov_train.reset_index(drop=True))
    list_of_cov_ts_test = TimeSeries.from_dataframe(all_dfs_cov_test.reset_index(drop=True))
else:

     nbr_train_sample = int(len(list_of_target_ts) * SPLIT)
     list_of_target_ts_train = list_of_target_ts[:nbr_train_sample]
     list_of_target_ts_test = list_of_target_ts[nbr_train_sample:]
     
     list_of_cov_ts_train = list_of_cov_ts[:nbr_train_sample]
     list_of_cov_ts_test = list_of_cov_ts[nbr_train_sample:]

model = NBEATSModel(input_chunk_length=4,
                    output_chunk_length=1,
                    batch_size=512,
                    n_epochs=5,
                    nr_epochs_val_period=1, 
                    model_name="NBEATS_test",
                    generic_architecture=True,
                    force_reset=True,
                    save_checkpoints=True,
                    show_warnings=True,
                    log_tensorboard=True, 
                    torch_device_str='cuda:0'
                   )

model.fit(series=list_of_target_ts_train, 
          past_covariates=list_of_cov_ts_train, 
          val_series=list_of_target_ts_val, 
          val_past_covariates=list_of_cov_ts_val, 
          verbose=True,
          num_loader_workers=20)

As Multiple-Series training I get:
Epoch 0: 8%|██████████▉ | 2250/27807 [03:00<34:11, 12.46it/s, loss=nan, v_num=logs, train_loss=nan.0

As a single serie training I get:
Epoch 0: 24%|█████████████████████████▋ | 669/2783 [01:04<03:24, 10.33it/s, loss=0.00758, v_num=logs, train_loss=0.00875]

I am also confused by the number of sample per epoch with the same batch size as from what I read here: https://unit8.com/resources/training-forecasting-models/ the single serie should have more sample as the window size cut is not happening for each Multiple Series.

Asked By: phoenire

||

Source

Answer 1

Regarding the NaNs, I would try reducing the learning rate if I were you. Also double check that there’s no NaN remaining in your data (see corresponding entry here 1)
Regarding the number of samples, each of the separate time series are split into several (input, output) slices. For the single series, this split is done once overall, whereas for the multiple series, this split is done once per series and then all the resulting samples are regrouped in a common training set. So it is expected to have more training samples with multiple series (and each training sample will have fewer dimensions compared to the single-multivariate-series case).

Answered By: Julien Herzen

Answer 2

Thanks Julien Herzen your answer helped me a lot to find the issue. I want to add more details on what was happening.

Regarding the NaNs: the filler from Darts is by default using pandas interpolation. That interpolation was not possible for the multi-series as some of the series had only NaN in those columns so nothing to interpolate from thus returning series with still NaN values. It was not happening for the concatenated to one single series because as all multi-series were concatenated there was value to interpolate from. If you do not need interpolation just add fill=0.0 in MissingValuesFiller(fill=0.0)
regarding the number of samples, after digging in Darts code, I found out that the NBeats model is using GenericShiftedDataset which for multi series is computing the length of the dataset by:

getting the length of the longest sub series and multiplying by the number of series.

self.max_samples_per_ts = (max(len(ts) for ts in self.target_series) - self.size_of_both_chunks + 1)

then when getitem is called

target_idx = idx // self.max_samples_per_ts
target_series = self.target_series[target_idx]

It select a series by dividing the idx by the max number of samples thus shorter series will be sampled more than longer one as they have less data but same chance to get sampled.

Here is my smallest example with input_chunk_length = 4 and output = 1:
Multi series with lengths: [71, 19] -> number of samples (71 * 2) – (2 * input_chunk_length) = 134

Concantenated into a Single serie: 90 -> number of samples: 90 – input_chunk_length = 86

In the multi series the sample in the short sub series will likely be sampled more time.

Answered By: phoenire

Multiple-series training input is giving NaN loss while same data but One-serie training input is not

Question:

Answers: