How would one use a RNN when predicting temperature?

Question:

Let’s assume I have a dataframe with several features, like humidity, pressure, and so on. One of these columns, would be temperature.

At each row, I have the data for one day. I would like to predict the temperature for the next day, with past data only.

How would I shape the dataframe so that it could be used in a RNN with Keras?

Answers:

Let’s assume you have the following data structure and we want to predict the temperature given 1 day in the past:

import tensorflow as tf
import pandas as pd
import numpy as np

df = pd.DataFrame(data={
    'temperature': np.random.random((1, 20)).ravel(),
    'pressure': np.random.random((1, 20)).ravel(),
    'humidity': np.random.random((1, 20)).ravel(),
    'wind': np.random.random((1, 20)).ravel()
})

print(df.to_markdown())
temperature pressure humidity wind
0 0.0589101 0.278302 0.875369 0.622687
1 0.594924 0.797274 0.510012 0.374484
2 0.511291 0.334929 0.401483 0.77062
3 0.711329 0.72051 0.595685 0.872691
4 0.495425 0.520179 0.516858 0.628928
5 0.676054 0.67902 0.0213801 0.0267594
6 0.058189 0.69932 0.885174 0.00602091
7 0.708245 0.871698 0.345451 0.448352
8 0.958427 0.471423 0.412678 0.618024
9 0.941202 0.825181 0.211916 0.0808273
10 0.49252 0.541955 0.00522009 0.396557
11 0.323757 0.113585 0.797503 0.323961
12 0.819055 0.637116 0.285361 0.569794
13 0.95123 0.00604303 0.208746 0.150214
14 0.89466 0.948916 0.556422 0.555165
15 0.705789 0.269704 0.289568 0.391438
16 0.154502 0.703137 0.184157 0.765623
17 0.25974 0.934706 0.172775 0.412022
18 0.403475 0.144796 0.0224043 0.891236
19 0.922302 0.805214 0.0232178 0.951568

The first thing we have to do is separate the data into features and labels:

features = df.iloc[::2, :] # Get every first row 
labels = df.iloc[1::2, :] # Get every second row since we want to predict the temperature given 1 day in the past

Features:

temperature pressure humidity wind
0 0.0589101 0.278302 0.875369 0.622687
2 0.511291 0.334929 0.401483 0.77062
4 0.495425 0.520179 0.516858 0.628928
6 0.058189 0.69932 0.885174 0.00602091
8 0.958427 0.471423 0.412678 0.618024
10 0.49252 0.541955 0.00522009 0.396557
12 0.819055 0.637116 0.285361 0.569794
14 0.89466 0.948916 0.556422 0.555165
16 0.154502 0.703137 0.184157 0.765623
18 0.403475 0.144796 0.0224043 0.891236

Labels:

temperature pressure humidity wind
1 0.594924 0.797274 0.510012 0.374484
3 0.711329 0.72051 0.595685 0.872691
5 0.676054 0.67902 0.0213801 0.0267594
7 0.708245 0.871698 0.345451 0.448352
9 0.941202 0.825181 0.211916 0.0808273
11 0.323757 0.113585 0.797503 0.323961
13 0.95123 0.00604303 0.208746 0.150214
15 0.705789 0.269704 0.289568 0.391438
17 0.25974 0.934706 0.172775 0.412022
19 0.922302 0.805214 0.0232178 0.951568

Since you are only interested in predicting the temperature, we can remove the other features from the labels and convert both to arrays:

features = features.to_numpy() # shape (10, 4)
labels = labels['temperature'].to_numpy() # shape (10,)
features = np.expand_dims(features, axis=1) # shape (10, 1, 4)

Note that a time dimension is added to features, which essentially means that each sample in the dataset represents one timestep (one day) and for each timestep there are 4 features (temperature, pressure, humidity, wind).

Building and running a RNN model:

inputs = tf.keras.layers.Input(shape=(features.shape[1], features.shape[2]))
rnn_out = tf.keras.layers.SimpleRNN(32)(inputs)
outputs = tf.keras.layers.Dense(1)(rnn_out) # one output = temperature

model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
history = model.fit(features, labels, batch_size=2, epochs=3)
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 1, 4)]            0         
                                                                 
 simple_rnn (SimpleRNN)      (None, 32)                1184      
                                                                 
 dense_1 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 1,217
Trainable params: 1,217
Non-trainable params: 0
_________________________________________________________________
Epoch 1/3
5/5 [==============================] - 1s 9ms/step - loss: 0.7859
Epoch 2/3
5/5 [==============================] - 0s 7ms/step - loss: 0.5862
Epoch 3/3
5/5 [==============================] - 0s 6ms/step - loss: 0.4354

Make predictions like this:

samples = 1
model.predict(tf.random.normal((samples, 1, 4)))
# array([[-1.610171]], dtype=float32)

You can also consider normalizing your data before training like this:

# You usually also normalize your data before training
mean = df.mean(axis=0)
std = df.std(axis=0)
df = df - mean / std

And that’s about it.

Answered By: AloneTogether