Split list into N lists, and assign each list to a worker in multithreading

Question:

I’m writing a script that takes N records from a table, and processes the said records via multithreading.

Previously I simply used Order by RAND() in my SQL statement within each worker definition, and hoped that there would be no duplicates.

This sort of works (deduping is done later), however, I would like to make my script more efficient by:

1) querying the table once, extract N records, and assign them to a list

2) split the big list into ~equally-sized lists of Y lists, which can be accomplished via :

number_of_workers = 2
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack']
def chunkify(lst,n):
     return [lst[i::n] for i in xrange(n)]
list1 = chunkify(first_names, number_of_workers)
print list1

3) When defining the worker function in multithreading, pass on a different sublist to each worker. Note – the number of workers (and parts I want to split the query result into) is defined at the beginning of the function.
However, as I’m fairly new to Python, I have no idea how to pass on each sublist to a separate worker (or is it even doable?)
Any help, other suggestions, etc. would be much appreciated!

Example of multithreading code is below. How would I use

import threading
import random

def worker():

    assign sublistN to worker N 
    print sublistN

threads = []
for i in range(number_of_workers):
    print i
    print ""
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()

Thank you in advance!

Asked By: FlyingZebra1

||

Answers:

Two things:

First, take a look at the Queue object. You don’t even need to split the lists apart yourself this way. It’s used for splitting a collection of objects between multiple threads (there’s also a multi-process varient, which is where I’m getting to). The docs contain very good examples that fit your requirements.

Second, unless your workers involve waiting on things such as IO, network requests etc. threading in python is no quicker (probably slower actually) than processing sequentially. Threading does not make use of multi-processing, only one thread is ever running at one time. If this is your case, you’ll probably want Multiprocessing which actually spins up a whole new python process for working. You’ve got similar tools such as queues in here.

Answered By: SCB

As SCB mentioned, this was solved by utilizing que.
Here is a quick example that takes a list of names -> passes a name to each worker (2 workers) -> each workers simply prints the name they were given.

from Queue import Queue
from threading import Thread
from time import sleep
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack','tara','bobby']


q = Queue(first_names)
num_threads = 2

def do_stuff(q):
    while True:
        print q.get()
        sleep(1)
        q.task_done()



for i in range(num_threads):
    worker = Thread(target=do_stuff, args=(q,))
    worker.start()

for x in first_names:
    q.put(x)

q.join()

Code adapted from here.

Answered By: FlyingZebra1

Much Needed Fixes In @FlyingZebra1.

from queue import Queue
from threading import Thread
from time import sleep
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack','tara','bobby']
    
q = Queue() # This will be Empty
num threads = 2 # No of Threads

def do_stuff():
    while True:
        item = q.get()
        if item is None: # Our Script will not Break it this is Missing
            break
        print q.get()
        sleep(1)
        q.task_done()   


for i in range(num_threads):
    worker = Thread(target=do_stuff)
    worker. Start()

q.join()

for x in first_names:
    q.put(None)

Just a Fix.

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.