How to create time_mask with two conditions in Python

Question

I need to plot data that starts and end at a certain time, next to this I need to exclude a period in the weekend in that time period.

How can I create a time_mask of my data that has two rules?

I already created a code for the "Start" and "End" period, but I am not able to add the rule for excluding the "Weekend period".

#create a time_mask
start_date       = '2022-06-30 15:26:00'
end_date         = '2022-07-11 15:30:00'
weekend_end      = '2022-07-08 14:30:00'
weekend_start    = '2022-07-11 09:50:00'

time_mask        = (df['Time'] > start_date) & (df['Time'] <= end_date)


# use only this part of the dataframe as training data

df1_train        = df1.loc[time_mask]

I tried to exclude the "Weekend period" with the code below, but this is not working…

time_mask        = ((df['Time'] > start_date) & (df['Time'] <= end_date) & ((df['Time'] < weekend_start) or (df['Time'] > weekend_end)))

I already solved the problem for one part. But now in my plot the period is not excluded:

Plot

Plot in operating hours

UPDATE 22-08-22

#%% Plot data

fig, ax = plt.subplots()
ax.plot(df['Time'], df1[Temp])
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%Y-%m-%d %H:%M:%S'))
fig.autofmt_xdate()

plt.show()

#%% Plot the data without empty values

N = len(df['Time'])
ind = np.arange(N)

def format_date(x, pos=None):
    thisind = np.clip(int(x + 0.5), 0, N - 1)
    return df['Time'][thisind].strftime('%Y-%m-%d %H:%M:%S')

fig, ax = plt.subplots()
ax.plot(ind, df[Temp])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
ax.set_title("Without empty values")
fig.autofmt_xdate()

plt.show()

Update 22-08-22

Asked By: ecc98

||

Source

Answer 1

use ‘|’ instead of or.
And in my opinion, you confused weekend_end with weekend_start, since the start is a later date, and the end, on the contrary, is early.
After filtering by condition:

(df['Time'] > start_date) & (df['Time'] <= end_date)

the data is again filtered by time greater than weekend_start:

(df['Time'] > weekend_start)

or time less than weekend_end:

(df['Time'] < weekend_end)

that is, the period from 2022-07-08 14:30:00 to 2022-07-11 09:50:00 is excluded.

Now about drawing. The fact is that the axis with dates and times is continuous. Even if there is no data in a certain period. On the left is a picture that does not remove this gap, on the right, the ‘format_date’ function is used to exclude this gap.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates
import matplotlib.ticker as ticker

df = pd.read_csv('Data.csv', sep=',', header=0)

start_date = '2022-06-30 15:26:00'
end_date = '2022-07-11 15:30:00'
weekend_end = '2022-07-08 14:30:00'
weekend_start = '2022-07-11 09:50:00'

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

time_mask = ((df['Timestamp'] > start_date) & (df['Timestamp'] <= end_date) & (
                (df['Timestamp'] > weekend_start) | (df['Timestamp'] < weekend_end)))

df1 = df[time_mask].copy()
df1 = df1.set_index('Timestamp')

fig, axes = plt.subplots(ncols=2)
ax = axes[0]
ax.plot(df1.index, df1['Data'])
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%Y-%m-%d %H:%M:%S'))
ax.set_title("Default")
fig.autofmt_xdate()

N = len(df1['Data'])
ind = np.arange(N)

def format_date(x, pos=None):
    thisind = np.clip(int(x + 0.5), 0, N - 1)
    return df1.index[thisind].strftime('%Y-%m-%d %H:%M:%S')

ax = axes[1]
ax.plot(ind, df1['Data'])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
ax.set_title("Without empty values")
fig.autofmt_xdate()

plt.show()

Note that the ‘Timestamp’ column is converted to an index.

df1 = df1.set_index('Timestamp')

Below is the drawing code with a simple moving average. It’s hard for me to calculate ema. You can use a library like TA-Lib.

df1['sma'] = df1['Data'].rolling(window=33).mean()

N = len(df1.index)
ind = np.arange(N)

def format_date(x, pos=None):
    thisind = np.clip(int(x + 0.5), 0, N - 1)
    return df1.index[thisind].strftime('%Y-%m-%d %H:%M:%S')

fig, ax = plt.subplots()
ax.plot(ind, df1['Data'])
ax.plot(ind, df1['sma'])
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
fig.autofmt_xdate()

plt.show()

also it seems correct to me, to convert strings to date time format, make them like in file:

start_date = pd.to_datetime('2022-06-30T15:26:00+02:00', errors='coerce')
end_date = pd.to_datetime('2022-07-11T15:30:00+02:00', errors='coerce')
weekend_end = pd.to_datetime('2022-07-08T14:30:00+02:00', errors='coerce')
weekend_start = pd.to_datetime('2022-07-11T09:50:00+02:00', errors='coerce')

Update 12/09/2022.

made it more convenient to draw without gaps. Created a column from an index by converting the data to strings. In the previous version, the same principle, but here everything is done at once without a function. Also applied MaxNLocator is how many divisions to display.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

df = pd.read_csv('Data.csv', sep=',', header=0)

start_date = '2022-06-30 15:26:00'
end_date = '2022-07-11 15:30:00'
weekend_end = '2022-07-08 14:30:00'
weekend_start = '2022-07-11 09:50:00'

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

time_mask = ((df['Timestamp'] > start_date) & (df['Timestamp'] <= end_date) & (
                (df['Timestamp'] > weekend_start) | (df['Timestamp'] < weekend_end)))

df1 = df[time_mask].copy()
df1 = df1.set_index('Timestamp')
df1['string'] = df1.index.astype(str)

df1['sma'] = df1['Data'].rolling(window=33).mean()

fig, ax = plt.subplots()
ax.plot(df1['string'], df1['Data'])
ax.plot(df1['string'], df1['sma'])
ax.xaxis.set_major_locator(MaxNLocator(nbins=5))
fig.autofmt_xdate()

plt.show()

Answered By: inquirer

How to create time_mask with two conditions in Python

Question:

Answers: