How to use windows created by the Dataset.window() method in TensorFlow 2.0?

Question:

I’m trying to create a dataset that will return random windows from a time series, along with the next value as the target, using TensorFlow 2.0.

I’m using Dataset.window(), which looks promising:

import tensorflow as tf

dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
dataset = dataset.window(5, shift=1, drop_remainder=True)
for window in dataset:
    print([elem.numpy() for elem in window])

Outputs:

[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]

However, I would like to use the last value as the target. If each window was a tensor, I would use:

dataset = dataset.map(lambda window: (window[:-1], window[-1:]))

However, if I try this, I get an exception:

TypeError: '_VariantDataset' object is not subscriptable
Asked By: MiniQuark

||

Answers:

The solution is to call flat_map() like this:

dataset = dataset.flat_map(lambda window: window.batch(5))

Now each item in the dataset is a window, so you can split it like this:

dataset = dataset.map(lambda window: (window[:-1], window[-1:]))

So the full code is:

import tensorflow as tf

dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
dataset = dataset.window(5, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))

for X, y in dataset:
    print("Input:", X.numpy(), "Target:", y.numpy())

Which outputs:

Input: [0 1 2 3] Target: [4]
Input: [1 2 3 4] Target: [5]
Input: [2 3 4 5] Target: [6]
Input: [3 4 5 6] Target: [7]
Input: [4 5 6 7] Target: [8]
Input: [5 6 7 8] Target: [9]
Answered By: MiniQuark
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.