How would one use a RNN when predicting temperature?
Question:
Let’s assume I have a dataframe with several features, like humidity, pressure, and so on. One of these columns, would be temperature.
At each row, I have the data for one day. I would like to predict the temperature for the next day, with past data only.
How would I shape the dataframe so that it could be used in a RNN with Keras?
Answers:
Let’s assume you have the following data structure and we want to predict the temperature given 1 day in the past:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame(data={
'temperature': np.random.random((1, 20)).ravel(),
'pressure': np.random.random((1, 20)).ravel(),
'humidity': np.random.random((1, 20)).ravel(),
'wind': np.random.random((1, 20)).ravel()
})
print(df.to_markdown())
temperature
pressure
humidity
wind
0
0.0589101
0.278302
0.875369
0.622687
1
0.594924
0.797274
0.510012
0.374484
2
0.511291
0.334929
0.401483
0.77062
3
0.711329
0.72051
0.595685
0.872691
4
0.495425
0.520179
0.516858
0.628928
5
0.676054
0.67902
0.0213801
0.0267594
6
0.058189
0.69932
0.885174
0.00602091
7
0.708245
0.871698
0.345451
0.448352
8
0.958427
0.471423
0.412678
0.618024
9
0.941202
0.825181
0.211916
0.0808273
10
0.49252
0.541955
0.00522009
0.396557
11
0.323757
0.113585
0.797503
0.323961
12
0.819055
0.637116
0.285361
0.569794
13
0.95123
0.00604303
0.208746
0.150214
14
0.89466
0.948916
0.556422
0.555165
15
0.705789
0.269704
0.289568
0.391438
16
0.154502
0.703137
0.184157
0.765623
17
0.25974
0.934706
0.172775
0.412022
18
0.403475
0.144796
0.0224043
0.891236
19
0.922302
0.805214
0.0232178
0.951568
The first thing we have to do is separate the data into features and labels:
features = df.iloc[::2, :] # Get every first row
labels = df.iloc[1::2, :] # Get every second row since we want to predict the temperature given 1 day in the past
Features:
temperature
pressure
humidity
wind
0
0.0589101
0.278302
0.875369
0.622687
2
0.511291
0.334929
0.401483
0.77062
4
0.495425
0.520179
0.516858
0.628928
6
0.058189
0.69932
0.885174
0.00602091
8
0.958427
0.471423
0.412678
0.618024
10
0.49252
0.541955
0.00522009
0.396557
12
0.819055
0.637116
0.285361
0.569794
14
0.89466
0.948916
0.556422
0.555165
16
0.154502
0.703137
0.184157
0.765623
18
0.403475
0.144796
0.0224043
0.891236
Labels:
temperature
pressure
humidity
wind
1
0.594924
0.797274
0.510012
0.374484
3
0.711329
0.72051
0.595685
0.872691
5
0.676054
0.67902
0.0213801
0.0267594
7
0.708245
0.871698
0.345451
0.448352
9
0.941202
0.825181
0.211916
0.0808273
11
0.323757
0.113585
0.797503
0.323961
13
0.95123
0.00604303
0.208746
0.150214
15
0.705789
0.269704
0.289568
0.391438
17
0.25974
0.934706
0.172775
0.412022
19
0.922302
0.805214
0.0232178
0.951568
Since you are only interested in predicting the temperature, we can remove the other features from the labels and convert both to arrays:
features = features.to_numpy() # shape (10, 4)
labels = labels['temperature'].to_numpy() # shape (10,)
features = np.expand_dims(features, axis=1) # shape (10, 1, 4)
Note that a time dimension is added to features
, which essentially means that each sample in the dataset represents one timestep (one day) and for each timestep there are 4 features (temperature, pressure, humidity, wind).
Building and running a RNN model:
inputs = tf.keras.layers.Input(shape=(features.shape[1], features.shape[2]))
rnn_out = tf.keras.layers.SimpleRNN(32)(inputs)
outputs = tf.keras.layers.Dense(1)(rnn_out) # one output = temperature
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
history = model.fit(features, labels, batch_size=2, epochs=3)
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 1, 4)] 0
simple_rnn (SimpleRNN) (None, 32) 1184
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 1,217
Trainable params: 1,217
Non-trainable params: 0
_________________________________________________________________
Epoch 1/3
5/5 [==============================] - 1s 9ms/step - loss: 0.7859
Epoch 2/3
5/5 [==============================] - 0s 7ms/step - loss: 0.5862
Epoch 3/3
5/5 [==============================] - 0s 6ms/step - loss: 0.4354
Make predictions like this:
samples = 1
model.predict(tf.random.normal((samples, 1, 4)))
# array([[-1.610171]], dtype=float32)
You can also consider normalizing your data before training like this:
# You usually also normalize your data before training
mean = df.mean(axis=0)
std = df.std(axis=0)
df = df - mean / std
And that’s about it.
Let’s assume I have a dataframe with several features, like humidity, pressure, and so on. One of these columns, would be temperature.
At each row, I have the data for one day. I would like to predict the temperature for the next day, with past data only.
How would I shape the dataframe so that it could be used in a RNN with Keras?
Let’s assume you have the following data structure and we want to predict the temperature given 1 day in the past:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame(data={
'temperature': np.random.random((1, 20)).ravel(),
'pressure': np.random.random((1, 20)).ravel(),
'humidity': np.random.random((1, 20)).ravel(),
'wind': np.random.random((1, 20)).ravel()
})
print(df.to_markdown())
temperature | pressure | humidity | wind | |
---|---|---|---|---|
0 | 0.0589101 | 0.278302 | 0.875369 | 0.622687 |
1 | 0.594924 | 0.797274 | 0.510012 | 0.374484 |
2 | 0.511291 | 0.334929 | 0.401483 | 0.77062 |
3 | 0.711329 | 0.72051 | 0.595685 | 0.872691 |
4 | 0.495425 | 0.520179 | 0.516858 | 0.628928 |
5 | 0.676054 | 0.67902 | 0.0213801 | 0.0267594 |
6 | 0.058189 | 0.69932 | 0.885174 | 0.00602091 |
7 | 0.708245 | 0.871698 | 0.345451 | 0.448352 |
8 | 0.958427 | 0.471423 | 0.412678 | 0.618024 |
9 | 0.941202 | 0.825181 | 0.211916 | 0.0808273 |
10 | 0.49252 | 0.541955 | 0.00522009 | 0.396557 |
11 | 0.323757 | 0.113585 | 0.797503 | 0.323961 |
12 | 0.819055 | 0.637116 | 0.285361 | 0.569794 |
13 | 0.95123 | 0.00604303 | 0.208746 | 0.150214 |
14 | 0.89466 | 0.948916 | 0.556422 | 0.555165 |
15 | 0.705789 | 0.269704 | 0.289568 | 0.391438 |
16 | 0.154502 | 0.703137 | 0.184157 | 0.765623 |
17 | 0.25974 | 0.934706 | 0.172775 | 0.412022 |
18 | 0.403475 | 0.144796 | 0.0224043 | 0.891236 |
19 | 0.922302 | 0.805214 | 0.0232178 | 0.951568 |
The first thing we have to do is separate the data into features and labels:
features = df.iloc[::2, :] # Get every first row
labels = df.iloc[1::2, :] # Get every second row since we want to predict the temperature given 1 day in the past
Features:
temperature | pressure | humidity | wind | |
---|---|---|---|---|
0 | 0.0589101 | 0.278302 | 0.875369 | 0.622687 |
2 | 0.511291 | 0.334929 | 0.401483 | 0.77062 |
4 | 0.495425 | 0.520179 | 0.516858 | 0.628928 |
6 | 0.058189 | 0.69932 | 0.885174 | 0.00602091 |
8 | 0.958427 | 0.471423 | 0.412678 | 0.618024 |
10 | 0.49252 | 0.541955 | 0.00522009 | 0.396557 |
12 | 0.819055 | 0.637116 | 0.285361 | 0.569794 |
14 | 0.89466 | 0.948916 | 0.556422 | 0.555165 |
16 | 0.154502 | 0.703137 | 0.184157 | 0.765623 |
18 | 0.403475 | 0.144796 | 0.0224043 | 0.891236 |
Labels:
temperature | pressure | humidity | wind | |
---|---|---|---|---|
1 | 0.594924 | 0.797274 | 0.510012 | 0.374484 |
3 | 0.711329 | 0.72051 | 0.595685 | 0.872691 |
5 | 0.676054 | 0.67902 | 0.0213801 | 0.0267594 |
7 | 0.708245 | 0.871698 | 0.345451 | 0.448352 |
9 | 0.941202 | 0.825181 | 0.211916 | 0.0808273 |
11 | 0.323757 | 0.113585 | 0.797503 | 0.323961 |
13 | 0.95123 | 0.00604303 | 0.208746 | 0.150214 |
15 | 0.705789 | 0.269704 | 0.289568 | 0.391438 |
17 | 0.25974 | 0.934706 | 0.172775 | 0.412022 |
19 | 0.922302 | 0.805214 | 0.0232178 | 0.951568 |
Since you are only interested in predicting the temperature, we can remove the other features from the labels and convert both to arrays:
features = features.to_numpy() # shape (10, 4)
labels = labels['temperature'].to_numpy() # shape (10,)
features = np.expand_dims(features, axis=1) # shape (10, 1, 4)
Note that a time dimension is added to features
, which essentially means that each sample in the dataset represents one timestep (one day) and for each timestep there are 4 features (temperature, pressure, humidity, wind).
Building and running a RNN model:
inputs = tf.keras.layers.Input(shape=(features.shape[1], features.shape[2]))
rnn_out = tf.keras.layers.SimpleRNN(32)(inputs)
outputs = tf.keras.layers.Dense(1)(rnn_out) # one output = temperature
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
history = model.fit(features, labels, batch_size=2, epochs=3)
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 1, 4)] 0
simple_rnn (SimpleRNN) (None, 32) 1184
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 1,217
Trainable params: 1,217
Non-trainable params: 0
_________________________________________________________________
Epoch 1/3
5/5 [==============================] - 1s 9ms/step - loss: 0.7859
Epoch 2/3
5/5 [==============================] - 0s 7ms/step - loss: 0.5862
Epoch 3/3
5/5 [==============================] - 0s 6ms/step - loss: 0.4354
Make predictions like this:
samples = 1
model.predict(tf.random.normal((samples, 1, 4)))
# array([[-1.610171]], dtype=float32)
You can also consider normalizing your data before training like this:
# You usually also normalize your data before training
mean = df.mean(axis=0)
std = df.std(axis=0)
df = df - mean / std
And that’s about it.