Behavior of Python's time.sleep(0) under linux – Does it cause a context switch?

Question:

This pattern comes up a lot but I can’t find a straight answer.

An non-critical, un-friendly program might do

while(True):
    # do some work

Using other technologies and platforms, if you want to allow this program to run hot (use as much CPU cycles as possible) but be polite – allow other programs who are running hot to effectively slow me down, you’d frequently write:

while(True):
    #do some work
    time.sleep(0)

I’ve read conflicting information about whether the latter approach would do what I’d hope on python, running on a linux box. Does it cause a context switch, resulting in the behavior I mentioned above?

EDIT: For what’s worth, we tried a little experiment in Apple OSX (didn’t have a linux box handy). This box has 4 cores plus hyperthreading so we spun up 8 programs with just a

while(True):
    i += 1

As expected, the Activity Monitor shows each of the 8 processes as consuming over 95% CPU (apparently with 4 cores and hyperthreading you get 800% total). We then spun up a ninth such program. Now all 9 run around 85%. Now kill the ninth guy and spin up a program with

while(True):
    i += 1
    time.sleep(0)

I was hoping that this process would use close to 0% and the other 8 would run 95%. But instead, all nine run around 85%. So on Apple OSX, sleep(0) appears to have no effect.

Asked By: Matthew Lund

||

Answers:

I’d never thought about this, so I wrote this script:

import time

while True:
    print "loop"
    time.sleep(0.5)

Just as a test. Running this with strace -o isacontextswitch.strace -s512 python test.py gives you this output on the loop:

write(1, "loopn", 5)                   = 5
select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
write(1, "loopn", 5)                   = 5
select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
write(1, "loopn", 5)                   = 5
select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
write(1, "loopn", 5)                   = 5
select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
write(1, "loopn", 5)  

select() is a system call, so yes, you are context switching (ok technically a context switch is not actually necessary when you change to kernel space, but if you have other processes running, what you’re saying here is that unless you have data ready to read on your file descriptor, other processes can run until then) into the kernel in order to perform this. Interestingly, the delay is in selecting on stdin. This allows python to interrupt your input on events such as ctrl+c input, should they wish, without having to wait for the code to time out – which I think is quite neat.

I should note that the same applies to time.sleep(0) except that the time parameter passed in is {0,0}. And that spin locking is not really ideal for anything but very short delays – multiprocessing and threads provide the ability to wait on event objects.

Edit: So I had a look to see exactly what linux does. The implementation in do_select (fsselect.c) makes this check:

if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
    wait = NULL;
timed_out = 1;
}

if (end_time && !timed_out)
    slack = select_estimate_accuracy(end_time);

In other words, if an end time is provided and both parameters are zero (!0 = 1 and evaluates to true in C) then the wait is set to NULL and the select is considered timed out. However, that doesn’t mean the function returns back to you; it loops over all the file descriptors you have and calls cond_resched, thereby potentially allowing another process to run. In other words, what happens is entirely up to the scheduler; if your process has been hogging CPU time compared to other processes, chances are a context switch will take place. If not, the task you are in (the kernel do_select function) might just carry on until it completes.

I would re-iterate, however, that the best way to be nicer to other processes generally involves using other mechanisms than a spin lock.

Answered By: user257111

I think you have already the answer from @Ninefingers, but in this answer we will try to dive into python source code.

First the python time module is implemented in C and to see the time.sleep function implementation you can take a look at Modules/timemodule.c. As you can see (and without getting in all platform specific details) this function will delegate the call to the floatsleep function.

Now floatsleep is designed to work in different platform but still the behavior was designed to be the similar whenever it’s possible, but as we are interested only in unix-like platform let’s check that part only shall we:

...
Py_BEGIN_ALLOW_THREADS
sleep((int)secs);
Py_END_ALLOW_THREADS

As you can see floatsleep is calling C sleep and from sleep man page:

The sleep() function shall cause the calling thread to be suspended
from execution until either the number of realtime seconds specified
by the argument seconds has elapsed or …

But wait a minute didn’t we forgot about the GIL?

Well this is where Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros came in action (check Include/ceval.h if you are interested about the definition of this two macros), the C code above can be translated using this two macros to:

Save the thread state in a local variable.
Release the global interpreter lock.
... Do some blocking I/O operation ... (call sleep in our case)
Reacquire the global interpreter lock.
Restore the thread state from the local variable.

More information can be found about this two macro in the c-api doc.

Hope this was helpful.

Answered By: mouad

You are basically attempting to usurp the job of the OS CPU scheduler. It would likely be much better to simply call os.nice(100) to inform the scheduler that you’re very low priority so it can do its job properly.

Answered By: Omnifarious
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.