Use Systemd Watchdog with python. Muliprocessing

Question

How to reset Systemd Watchdog using Python? I’m implementing a watchdog for a multi-threaded picture detection software with many dependencies. Previously, the service started a shell script, but now it starts the Python file directly. However, the watchdog implementation is not functioning correctly. Is there a more effective alternative? The goal is to restart the "Picture Detection Main Application" service if the program gets stuck in a loop for 30 seconds or more.

Following the service in the systemd folder

[Unit]
Description=Picturedetection Main application
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=user
WorkingDirectory=/home/user/detection/
ExecStart=/usr/bin/python3 /home/user/detection/picturedetection.py
Environment=TF_CUDNN_USE_AUTOTUNE=0
WatchdogSec=30
Restart=always
WatchdogTimestamp=30

[Install]
WantedBy=multi-user.target

Following the python main i currently use

import sys
import syslog
from multiprocessing import Queue
from DetectionDefines import Detection_Version as OV
import time

print("OPTICONTROL START")
syslog.syslog(syslog.LOG_NOTICE, "PICTUREDETECTION START --- Version " + OV.major + "." + OV.minor)

from config.Config import Config as conf
from prediction.ImageFeed import ImageFeed

from prediction.ResultHandler import ResultHandler
from dataflow.CommServer import CommServer
from dataflow.FTLComm import FTLComm
from dataflow.MiniHTTPServer import MiniHTTPServer
from dataflow.GraphDownloader import GraphDownloader
from tools.Logger import Logger
from dataflow.FTPHandler import FTPHandler
from tools.FileJanitor import FileJanitor
from prediction.PredictionPipeline import PredictionPipeline

#Watchdog test
import os
import time
import systemd

# Communication
CommServer().start()
FTLComm()

#Experimental not working right now. Probably even delete
test = Logger("<WATCHDOGWATCHDOG> ")
def WatchdogReset():
    test.notice("WATCHDOG has been reseted")
    with open("/dev/watchdog", "w") as f:
        f.write("1")
#End of Experimental

# Other subprocesses
MiniHTTPServer().start()
FileJanitor().start()
FTPHandler().start()
GraphDownloader().start()


# Detection subprocesses
img_queue = Queue(maxsize = 1)
rst_queue = Queue(maxsize = conf.result_buffer)
ImageFeed(img_queue).start()
ResultHandler(rst_queue).start()

while True:
    # CUDA / TensorFlow need to be in the main process
    PredictionPipeline(img_queue, rst_queue).predict()
    systemd.daemon.notify("WATCHDOG=1")

Additionally, I want to ensure that the program restarts if it gets stuck in an infinite loop. However, this is a multi-threaded program. Will it still be able to restart while other processes are running?

I attempted to activate the watchdog using the method, but it seems to have no effect. The script restarts every 30 seconds. I considered the possibility of an error in my implementation, but using an "os" query didn’t resolve the issue.
Additionally, I attempted to use a custom "FileWatchdog" that sends error messages and restarts the service by executing a shell script. However, this requires superuser rights, and I don’t want to distribute software with a hardcoded password. Additionally, I believe this solution would pose a challenge in the long term.

Asked By: ArtHax

||

Source

Answer 1

I found the solution

Instead I used the sdnotify library which you can download via pip. Then I checked the currend processes if they´re still alive.

Like this:

import sdnotify
from tools.Logger import Logger
from tools import Watchdog
test = Logger("<WATCHDOGWATCHDOG> ")
n = sdnotify.SystemdNotifier()
n.notify("READY=1")

imdfg = ImageFeed(img_queue)
rslt = ResultHandler(rst_queue)
imdfg.start()
rslt.start()
if(Watchdog.check(imdfg)): 
   n.notify("WATCHDOG=1")
   test.notice("OPTICONTROL_WATCHDOG Reset")
   time.sleep(2)

#Watchdog file
from multiprocessing import process

def check(prc):
    return prc.is_alive()

Answered By: ArtHax

Use Systemd Watchdog with python. Muliprocessing

Question:

Answers: