Multiprocessing : More processes than cpu.count

Question:

Note: I “forayed” into the land of multiprocessing 2 days ago. So my understanding is very basic.

I am writing and application for uploads to amazon s3 buckets. In case the file size is larger(100mb), Ive implemented parallel uploads using pool from the multiprocessing module. I am using a machine with core i7 , i had a cpu_count of 8. I was under the impression that if i do pool = Pool(process = 6) I use 6 cores and the file begins to upload in parts and the uploads for the first 6 parts begins simultaneously. To see what happens when the process is greater than the cpu_count , i entered 20 (implying that i want to use 20 cores). To my surprise instead of getting a block of errors the program began to upload 20 parts simultaneously (I used a smaller chunk size to make sure there are plenty of parts).
I dont understand this behavior. I have only 8 cores, so how cant he program accept an input of 20? When I say process=6, does it actually use 6 threads?? Which can be the only explanation of 20 being a valid input as there can be 1000s of threads. Can someone please explain this to me.

Edit:

I ‘borrowed’ the code from here. I have changed it only slightly wherein I ask the user for a core usage for his choice instead of setting parallel_processes to 4

Asked By: letsc

||

Answers:

The number of processes running concurrently on your computer is not limited by the number of cores. In fact you probably have hundreds of programs running right now on your computer – each with its own process. To make it work the OS assigns one of your 8 processors to each process or thread only temporarily – at some point it may get stopped and another process will take its place. See What is the difference between concurrent programming and parallel programming? if you want to find out more.

Edit: Assigning more processes in your uploading example may or may not make sense. Reading from disk and sending over the network is normally a blocking operation in python. A process that waits for its chunk of data to be read or sent can be halted so that another process may start its IO. On the other hand, with too many processes either file I/O or network I/O will become a bottleneck and your program will slow down because of the additional overhead needed for process switching.

Answered By: Pyetras
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.