pandas overlay on top of a plot
Question:
I have some time series data in a pandas dataframe that I can visualise as follows:
import pandas as pd
d = {'end_time': [datetime.datetime(2020, 3, 17, 0, 0), datetime.datetime(2020, 3, 17, 0, 5), datetime.datetime(2020, 3, 17, 0, 10), datetime.datetime(2020, 3, 17, 0, 15), datetime.datetime(2020, 3, 17, 0, 20), datetime.datetime(2020, 3, 17, 0, 25), datetime.datetime(2020, 3, 17, 0, 30), datetime.datetime(2020, 3, 17, 0, 35), datetime.datetime(2020, 3, 17, 0, 40), datetime.datetime(2020, 3, 17, 0, 45), datetime.datetime(2020, 3, 17, 0, 50), datetime.datetime(2020, 3, 17, 0, 55)], "measurement": [2000, 1500, 800, 900, 400, 4000, 300, 900, 1000, 1250, 1100, 1300], "reliability": [99, 81, 84, 85, 99, 86, 96, 97, 98, 99, 98, 97]}
# select some relevant columns
subset_df = pd.DataFrame.from_dict(d)
# plot measurements over time
subset_df.plot('end_time', 'measurement')
Now the reliability
column is a number between 0
and 100
. What I want to do is highlight areas where this reliability score is below 95. So something where I can overlay a transparent box around those areas to visually highlight where the measurements may not be very reliable.
Answers:
Considering this random dataframe, where
-
end_time
: dates starting from 2020-03-17 00:00:00
to 2020-03-17 00:55:00
with a 5
minutes interval
-
measurement
: random integers between 300
and 4000
-
reliability
: random integers between 0
and 100
import pandas as pd
import numpy as np
df = pd.DataFrame({'end_time': pd.date_range(start='2020-03-17 00:00:00', end='2020-03-17 00:55:00', freq='5min'),
'measurement': np.random.randint(300, 4000, size=12),
'reliability': np.random.randint(0, 100, size=12)})
[Out]:
end_time measurement reliability
0 2020-03-17 00:00:00 3905 7
1 2020-03-17 00:05:00 1143 93
2 2020-03-17 00:10:00 2672 55
3 2020-03-17 00:15:00 416 29
4 2020-03-17 00:20:00 1246 21
5 2020-03-17 00:25:00 2743 32
6 2020-03-17 00:30:00 2798 49
7 2020-03-17 00:35:00 1012 21
8 2020-03-17 00:40:00 3894 64
9 2020-03-17 00:45:00 1218 18
10 2020-03-17 00:50:00 1600 97
11 2020-03-17 00:55:00 729 76
If the goal is to plot all measures with a reliability
lower than 95
in red and the rest in blue, let us first create a few variables that will be useful:
-
measurement
with a reliability
lower than 95
:
measures = df[df.reliability < 95].measurement
-
end_time
of the measurement
with a reliability
lower than 95
:
dates = df[df.reliability < 95].end_time
-
measurement
with a reliability
higher than 95:
measures2 = df[df.reliability >= 95].measurement
-
end_time
of the measurement
with a reliability
higher than 95
:
dates2 = df[df.reliability >= 95].end_time
Now let us create the plot
import matplotlib.pyplot as plt
# Create the plot:
plt.plot(dates, measures, 'ro', dates2, measures2, 'bo')
# Set the title:
plt.title('Measures over time')
# Set the x label:
plt.xlabel('Date')
# Set the y label:
plt.ylabel('Measure')
# Set the x ticks:
plt.xticks(rotation=45)
# Show the plot:
plt.show()
Now, as per requirement (use fill_between so that I can paint a transparent box from the x-axes to the top of y-axes
), before plt.show()
one can use the following
plt.fill_between(dates, 0, measures, color='red', alpha=0.2)
I have some time series data in a pandas dataframe that I can visualise as follows:
import pandas as pd
d = {'end_time': [datetime.datetime(2020, 3, 17, 0, 0), datetime.datetime(2020, 3, 17, 0, 5), datetime.datetime(2020, 3, 17, 0, 10), datetime.datetime(2020, 3, 17, 0, 15), datetime.datetime(2020, 3, 17, 0, 20), datetime.datetime(2020, 3, 17, 0, 25), datetime.datetime(2020, 3, 17, 0, 30), datetime.datetime(2020, 3, 17, 0, 35), datetime.datetime(2020, 3, 17, 0, 40), datetime.datetime(2020, 3, 17, 0, 45), datetime.datetime(2020, 3, 17, 0, 50), datetime.datetime(2020, 3, 17, 0, 55)], "measurement": [2000, 1500, 800, 900, 400, 4000, 300, 900, 1000, 1250, 1100, 1300], "reliability": [99, 81, 84, 85, 99, 86, 96, 97, 98, 99, 98, 97]}
# select some relevant columns
subset_df = pd.DataFrame.from_dict(d)
# plot measurements over time
subset_df.plot('end_time', 'measurement')
Now the reliability
column is a number between 0
and 100
. What I want to do is highlight areas where this reliability score is below 95. So something where I can overlay a transparent box around those areas to visually highlight where the measurements may not be very reliable.
Considering this random dataframe, where
-
end_time
: dates starting from2020-03-17 00:00:00
to2020-03-17 00:55:00
with a5
minutes interval -
measurement
: random integers between300
and4000
-
reliability
: random integers between0
and100
import pandas as pd import numpy as np df = pd.DataFrame({'end_time': pd.date_range(start='2020-03-17 00:00:00', end='2020-03-17 00:55:00', freq='5min'), 'measurement': np.random.randint(300, 4000, size=12), 'reliability': np.random.randint(0, 100, size=12)}) [Out]: end_time measurement reliability 0 2020-03-17 00:00:00 3905 7 1 2020-03-17 00:05:00 1143 93 2 2020-03-17 00:10:00 2672 55 3 2020-03-17 00:15:00 416 29 4 2020-03-17 00:20:00 1246 21 5 2020-03-17 00:25:00 2743 32 6 2020-03-17 00:30:00 2798 49 7 2020-03-17 00:35:00 1012 21 8 2020-03-17 00:40:00 3894 64 9 2020-03-17 00:45:00 1218 18 10 2020-03-17 00:50:00 1600 97 11 2020-03-17 00:55:00 729 76
If the goal is to plot all measures with a reliability
lower than 95
in red and the rest in blue, let us first create a few variables that will be useful:
-
measurement
with areliability
lower than95
:measures = df[df.reliability < 95].measurement
-
end_time
of themeasurement
with areliability
lower than95
:dates = df[df.reliability < 95].end_time
-
measurement
with areliability
higher than 95:measures2 = df[df.reliability >= 95].measurement
-
end_time
of themeasurement
with areliability
higher than95
:dates2 = df[df.reliability >= 95].end_time
Now let us create the plot
import matplotlib.pyplot as plt
# Create the plot:
plt.plot(dates, measures, 'ro', dates2, measures2, 'bo')
# Set the title:
plt.title('Measures over time')
# Set the x label:
plt.xlabel('Date')
# Set the y label:
plt.ylabel('Measure')
# Set the x ticks:
plt.xticks(rotation=45)
# Show the plot:
plt.show()
Now, as per requirement (use fill_between so that I can paint a transparent box from the x-axes to the top of y-axes
), before plt.show()
one can use the following
plt.fill_between(dates, 0, measures, color='red', alpha=0.2)