Can't pause python process using debug

Question:

I have a python script which starts multiple sub processes using these lines :

for elm in elements:
    t = multiprocessing.Process(target=sub_process,args=[elm])
    threads.append(t)
    t.start()

for t in threads:
   t.join()

Sometimes, for some reason the thread halts and the script never finishes.
I’m trying to use VSCode debugger to find the problem and check where in the thread itself it stuck but I’m having issues pausing these sub processes because when I click the pause in the debugger window:
enter image description here

It will pause the main thread and some other threads that are running properly but it won’t pause the stuck sub process.
Even when I try to pause the threads manually one by one using the Call Stack window, I can still pause only the working threads and not the stuck one.
enter image description here

Please help me figure this thing, It’s a hard thing because the thing that makes the process stuck doesn’t always happen so it makes it very hard to debug.

Asked By: user2396640

||

Answers:

First, those are subprocesses, not threads. It’s important to understand
the difference, although it doesn’t answer your question.

Second, a pause (manual break) in the Python debugger will break in Python code.
It won’t break in the machine code below that executes the Python, or in the machine
code below that performing the OS services the Python code is asking for.

If you execute a pause, the pause will occur in the Python code above
the machine code when (and if) the machine code returns to the Python interpreter loop.

Given a complete example:

import multiprocessing
import time

elements = ["one", "two", "three"]

def sub_process(gs, elm):
    gs.acquire()
    print("sleep", elm)
    time.sleep(60)
    print("awake", elm);
    gs.release()

def test():
    gs = multiprocessing.Semaphore()

    subprocs = []

    for elm in elements:
        p = multiprocessing.Process(target=sub_process,args=[gs, elm])
        subprocs.append(p)
        p.start()

    for p in subprocs:
        p.join()

if __name__ == '__main__':
    test()

The first subprocess will grab the semaphore and sleep for a minute,
and the second and third subprocesses will wait inside gs.acquire() until they
can move forward. A pause will not break into the debugger until the
subprocess returns from the acquire, because acquire is below the Python code.

It sounds like you have an idea where the process is getting stuck,
but you don’t know why. You need to determine what questions
you are trying to answer. For example:

(Assuming) one of the processess is stuck in acquire. That means one of the other
processess didn’t release the semaphore. What code in which process is
acquiring a semaphore and not releasing it?

Looking at the semaphore object itself might tell you which subprocess is holding it,
but this is a tangent: can you use the debugger to inspect the semaphore
and determine who is holding it? For example, using a machine level debugger in windows,
if these were threads and a critical section, it’s possible to look at the critical section
and see which thread is still holding it. I don’t know if this could be
done using processes and semaphores on your chosen platform.

Which debuggers you have access to depend on the platform you’re running on.

In summary:

  • You can’t break the Python debugger in machine code
  • You can run the Python interpreter in a machine code debugger, but this
    won’t show you the Python code at all, which make life interesting.
    This can be helpful if you have an idea what you’re looking for –
    for example, you might be able to tell that you’re stuck waiting for a semaphore.
  • Running a machine code debugger becomes more difficult when you’re running
    sub-processes, because you need to know which sub-process you’re interested
    in, and attach to that one. This becomes simpler if you’re using a single
    process and multiple threads instead, since there’s only one process to deal with.

"You can’t get there from here, you have to go someplace else first."

You’ll need to take a closer look at your code and figure out how
to answer the questions you need to answer using other means.

Answered By: dirck

It’s just an idea, Why not to set a timeout on your sub processes and terminate it?

TIMEOUT = 60

for elm in elements:
    t = multiprocessing.Process(target=sub_process,args=[elm])
    t.daemon = True
    threads.append(t)
    t.start()
    t.join(TIMEOUT)

for t in threads:
   t.join()
Answered By: realSamy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.