How to create time_mask with two conditions in Python
Question:
I need to plot data that starts and end at a certain time, next to this I need to exclude a period in the weekend in that time period.
How can I create a time_mask of my data that has two rules?
I already created a code for the "Start" and "End" period, but I am not able to add the rule for excluding the "Weekend period".
#create a time_mask
start_date = '2022-06-30 15:26:00'
end_date = '2022-07-11 15:30:00'
weekend_end = '2022-07-08 14:30:00'
weekend_start = '2022-07-11 09:50:00'
time_mask = (df['Time'] > start_date) & (df['Time'] <= end_date)
# use only this part of the dataframe as training data
df1_train = df1.loc[time_mask]
I tried to exclude the "Weekend period" with the code below, but this is not working…
time_mask = ((df['Time'] > start_date) & (df['Time'] <= end_date) & ((df['Time'] < weekend_start) or (df['Time'] > weekend_end)))
I already solved the problem for one part. But now in my plot the period is not excluded:
UPDATE 22-08-22
#%% Plot data
fig, ax = plt.subplots()
ax.plot(df['Time'], df1[Temp])
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%Y-%m-%d %H:%M:%S'))
fig.autofmt_xdate()
plt.show()
#%% Plot the data without empty values
N = len(df['Time'])
ind = np.arange(N)
def format_date(x, pos=None):
thisind = np.clip(int(x + 0.5), 0, N - 1)
return df['Time'][thisind].strftime('%Y-%m-%d %H:%M:%S')
fig, ax = plt.subplots()
ax.plot(ind, df[Temp])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
ax.set_title("Without empty values")
fig.autofmt_xdate()
plt.show()
Answers:
use ‘|’ instead of or.
And in my opinion, you confused weekend_end with weekend_start, since the start is a later date, and the end, on the contrary, is early.
After filtering by condition:
(df['Time'] > start_date) & (df['Time'] <= end_date)
the data is again filtered by time greater than weekend_start:
(df['Time'] > weekend_start)
or time less than weekend_end:
(df['Time'] < weekend_end)
that is, the period from 2022-07-08 14:30:00 to 2022-07-11 09:50:00 is excluded.
Now about drawing. The fact is that the axis with dates and times is continuous. Even if there is no data in a certain period. On the left is a picture that does not remove this gap, on the right, the ‘format_date’ function is used to exclude this gap.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates
import matplotlib.ticker as ticker
df = pd.read_csv('Data.csv', sep=',', header=0)
start_date = '2022-06-30 15:26:00'
end_date = '2022-07-11 15:30:00'
weekend_end = '2022-07-08 14:30:00'
weekend_start = '2022-07-11 09:50:00'
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
time_mask = ((df['Timestamp'] > start_date) & (df['Timestamp'] <= end_date) & (
(df['Timestamp'] > weekend_start) | (df['Timestamp'] < weekend_end)))
df1 = df[time_mask].copy()
df1 = df1.set_index('Timestamp')
fig, axes = plt.subplots(ncols=2)
ax = axes[0]
ax.plot(df1.index, df1['Data'])
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%Y-%m-%d %H:%M:%S'))
ax.set_title("Default")
fig.autofmt_xdate()
N = len(df1['Data'])
ind = np.arange(N)
def format_date(x, pos=None):
thisind = np.clip(int(x + 0.5), 0, N - 1)
return df1.index[thisind].strftime('%Y-%m-%d %H:%M:%S')
ax = axes[1]
ax.plot(ind, df1['Data'])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
ax.set_title("Without empty values")
fig.autofmt_xdate()
plt.show()
Note that the ‘Timestamp’ column is converted to an index.
df1 = df1.set_index('Timestamp')
Below is the drawing code with a simple moving average. It’s hard for me to calculate ema. You can use a library like TA-Lib.
df1['sma'] = df1['Data'].rolling(window=33).mean()
N = len(df1.index)
ind = np.arange(N)
def format_date(x, pos=None):
thisind = np.clip(int(x + 0.5), 0, N - 1)
return df1.index[thisind].strftime('%Y-%m-%d %H:%M:%S')
fig, ax = plt.subplots()
ax.plot(ind, df1['Data'])
ax.plot(ind, df1['sma'])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
fig.autofmt_xdate()
plt.show()
also it seems correct to me, to convert strings to date time format, make them like in file:
also it seems correct to me, to convert strings to date time format, make them like in file:
start_date = pd.to_datetime('2022-06-30T15:26:00+02:00', errors='coerce')
end_date = pd.to_datetime('2022-07-11T15:30:00+02:00', errors='coerce')
weekend_end = pd.to_datetime('2022-07-08T14:30:00+02:00', errors='coerce')
weekend_start = pd.to_datetime('2022-07-11T09:50:00+02:00', errors='coerce')
Update 12/09/2022.
made it more convenient to draw without gaps. Created a column from an index by converting the data to strings. In the previous version, the same principle, but here everything is done at once without a function. Also applied MaxNLocator is how many divisions to display.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
df = pd.read_csv('Data.csv', sep=',', header=0)
start_date = '2022-06-30 15:26:00'
end_date = '2022-07-11 15:30:00'
weekend_end = '2022-07-08 14:30:00'
weekend_start = '2022-07-11 09:50:00'
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
time_mask = ((df['Timestamp'] > start_date) & (df['Timestamp'] <= end_date) & (
(df['Timestamp'] > weekend_start) | (df['Timestamp'] < weekend_end)))
df1 = df[time_mask].copy()
df1 = df1.set_index('Timestamp')
df1['string'] = df1.index.astype(str)
df1['sma'] = df1['Data'].rolling(window=33).mean()
fig, ax = plt.subplots()
ax.plot(df1['string'], df1['Data'])
ax.plot(df1['string'], df1['sma'])
ax.xaxis.set_major_locator(MaxNLocator(nbins=5))
fig.autofmt_xdate()
plt.show()
I need to plot data that starts and end at a certain time, next to this I need to exclude a period in the weekend in that time period.
How can I create a time_mask of my data that has two rules?
I already created a code for the "Start" and "End" period, but I am not able to add the rule for excluding the "Weekend period".
#create a time_mask
start_date = '2022-06-30 15:26:00'
end_date = '2022-07-11 15:30:00'
weekend_end = '2022-07-08 14:30:00'
weekend_start = '2022-07-11 09:50:00'
time_mask = (df['Time'] > start_date) & (df['Time'] <= end_date)
# use only this part of the dataframe as training data
df1_train = df1.loc[time_mask]
I tried to exclude the "Weekend period" with the code below, but this is not working…
time_mask = ((df['Time'] > start_date) & (df['Time'] <= end_date) & ((df['Time'] < weekend_start) or (df['Time'] > weekend_end)))
I already solved the problem for one part. But now in my plot the period is not excluded:
UPDATE 22-08-22
#%% Plot data
fig, ax = plt.subplots()
ax.plot(df['Time'], df1[Temp])
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%Y-%m-%d %H:%M:%S'))
fig.autofmt_xdate()
plt.show()
#%% Plot the data without empty values
N = len(df['Time'])
ind = np.arange(N)
def format_date(x, pos=None):
thisind = np.clip(int(x + 0.5), 0, N - 1)
return df['Time'][thisind].strftime('%Y-%m-%d %H:%M:%S')
fig, ax = plt.subplots()
ax.plot(ind, df[Temp])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
ax.set_title("Without empty values")
fig.autofmt_xdate()
plt.show()
use ‘|’ instead of or.
And in my opinion, you confused weekend_end with weekend_start, since the start is a later date, and the end, on the contrary, is early.
After filtering by condition:
(df['Time'] > start_date) & (df['Time'] <= end_date)
the data is again filtered by time greater than weekend_start:
(df['Time'] > weekend_start)
or time less than weekend_end:
(df['Time'] < weekend_end)
that is, the period from 2022-07-08 14:30:00 to 2022-07-11 09:50:00 is excluded.
Now about drawing. The fact is that the axis with dates and times is continuous. Even if there is no data in a certain period. On the left is a picture that does not remove this gap, on the right, the ‘format_date’ function is used to exclude this gap.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates
import matplotlib.ticker as ticker
df = pd.read_csv('Data.csv', sep=',', header=0)
start_date = '2022-06-30 15:26:00'
end_date = '2022-07-11 15:30:00'
weekend_end = '2022-07-08 14:30:00'
weekend_start = '2022-07-11 09:50:00'
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
time_mask = ((df['Timestamp'] > start_date) & (df['Timestamp'] <= end_date) & (
(df['Timestamp'] > weekend_start) | (df['Timestamp'] < weekend_end)))
df1 = df[time_mask].copy()
df1 = df1.set_index('Timestamp')
fig, axes = plt.subplots(ncols=2)
ax = axes[0]
ax.plot(df1.index, df1['Data'])
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%Y-%m-%d %H:%M:%S'))
ax.set_title("Default")
fig.autofmt_xdate()
N = len(df1['Data'])
ind = np.arange(N)
def format_date(x, pos=None):
thisind = np.clip(int(x + 0.5), 0, N - 1)
return df1.index[thisind].strftime('%Y-%m-%d %H:%M:%S')
ax = axes[1]
ax.plot(ind, df1['Data'])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
ax.set_title("Without empty values")
fig.autofmt_xdate()
plt.show()
Note that the ‘Timestamp’ column is converted to an index.
df1 = df1.set_index('Timestamp')
Below is the drawing code with a simple moving average. It’s hard for me to calculate ema. You can use a library like TA-Lib.
df1['sma'] = df1['Data'].rolling(window=33).mean()
N = len(df1.index)
ind = np.arange(N)
def format_date(x, pos=None):
thisind = np.clip(int(x + 0.5), 0, N - 1)
return df1.index[thisind].strftime('%Y-%m-%d %H:%M:%S')
fig, ax = plt.subplots()
ax.plot(ind, df1['Data'])
ax.plot(ind, df1['sma'])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
fig.autofmt_xdate()
plt.show()
also it seems correct to me, to convert strings to date time format, make them like in file:
also it seems correct to me, to convert strings to date time format, make them like in file:
start_date = pd.to_datetime('2022-06-30T15:26:00+02:00', errors='coerce')
end_date = pd.to_datetime('2022-07-11T15:30:00+02:00', errors='coerce')
weekend_end = pd.to_datetime('2022-07-08T14:30:00+02:00', errors='coerce')
weekend_start = pd.to_datetime('2022-07-11T09:50:00+02:00', errors='coerce')
Update 12/09/2022.
made it more convenient to draw without gaps. Created a column from an index by converting the data to strings. In the previous version, the same principle, but here everything is done at once without a function. Also applied MaxNLocator is how many divisions to display.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
df = pd.read_csv('Data.csv', sep=',', header=0)
start_date = '2022-06-30 15:26:00'
end_date = '2022-07-11 15:30:00'
weekend_end = '2022-07-08 14:30:00'
weekend_start = '2022-07-11 09:50:00'
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
time_mask = ((df['Timestamp'] > start_date) & (df['Timestamp'] <= end_date) & (
(df['Timestamp'] > weekend_start) | (df['Timestamp'] < weekend_end)))
df1 = df[time_mask].copy()
df1 = df1.set_index('Timestamp')
df1['string'] = df1.index.astype(str)
df1['sma'] = df1['Data'].rolling(window=33).mean()
fig, ax = plt.subplots()
ax.plot(df1['string'], df1['Data'])
ax.plot(df1['string'], df1['sma'])
ax.xaxis.set_major_locator(MaxNLocator(nbins=5))
fig.autofmt_xdate()
plt.show()