python multithreading/ multiprocessing for a loop with 3+ arguments

Question:

Hello i have a csv with about 2,5k lines of outlook emails and passwords

The CSV looks like

header:

username, password

content:

[email protected],123password1

[email protected],123password2

[email protected],123password3

[email protected],123password4

[email protected],123password5

the code allows me to go into the accounts and delete every mail from them, but its taking too long for 2,5k accounts to pass the script so i wanted to make it faster with multithreading.

This is my code:

from csv import DictReader

import imap_tools

from datetime import datetime


def IMAPDumper(accountList, IMAP_SERVER, search_criteria, row):
    accountcounter = 0
    with open(accountList, 'r') as read_obj:
        csv_dict_reader = DictReader(read_obj)

        for row in csv_dict_reader:
            # TIMESTAMP FOR FURTHER DEBUGGING TO CHECK IF THE SCRIPT IS STOPPING AT A POINT
            TIMESTAMP = datetime.now().strftime("[%H:%M:%S]")
            # adds a counter for the amount of accounts processed by the script
            accountcounter = accountcounter + 1
            print("_____________________________________________")
            print(TIMESTAMP, "Account", accountcounter)
            print("_____________________________________________")
            # resetting emailcounter each time
            emailcounter = 0
Asked By: user16200703

||

Answers:

This is not necessarily the best way to do it, but the shortest in writitng time. I don’t know if you are familiar with python generators, but we will have to use one. the generator will work as a work dispatcher.

def generator():
    with open("t.csv", 'r') as read_obj:
        
        csv_dict_reader = DictReader(read_obj)

        for row in read_obj:
            yield row
gen = generator()

Next, you will have your main function where you do your IMAP stuff

def main():

    while True:
        #The try prevent the thread from crashing when all the file will be processed
        try:

        #Returns next line of the csv
            working_set = next(gen)

            #do_some_stuff
            # - 
            #do_other_stuff

        except:
            break

Then you just have to split the work in multiple thread!

#You can change the number of thread
number_of_threads = 5

thread_list = []

#Creates 5 thread object
for _ in range(number_of_threads):

    thread_list.append(threading.Thread(target=main))

# Starts all thread object
for thread in thread_list:
    thread.start()

I hope this helped you!

Answered By: Fredericka

This is a job that is best accomplished using a thread pool whose optimum size will need to be experimented with. I have set the size below to 100, which may be overly ambitious (or not). You can try decreasing or increasing NUM_THREADS to see what effect it has.

The important thing is to modify function IMAPDumper so that it is passed a single row from the csv file that it is to be processed and that it therefore does not need to open and read the file itself.

There are various methods you can use with class ThreadPool in module multiprocessing.pool (this class is not well-documented; it is the multithreading analog of the multiprocessing pool class Pool in module multiprocessing.pool and has the same exact interface). The advantage of imap_unordered is that (1) the passed iterable argument can be a generator that will not be converted to a list, which will save memory and time if that list would be very large and (2) the ordering of the results (return values from the worker function, IMAPDumper in this case) are arbitrary and therefore might run slightly faster than imap or map. Since your worker function does not explicitly return a value (defaults to None), this should not matter.

from csv import DictReader
import imap_tools
from datetime import datetime
from multiprocessing.pool import ThreadPool
from functools import partial

def IMAPDumper(IMAP_SERVER, search_criteria, row):
    """ process a single row """
    # TIMESTAMP FOR FURTHER DEBUGGING TO CHECK IF THE SCRIPT IS STOPPING AT A POINT
    TIMESTAMP = datetime.now().strftime("[%H:%M:%S]")
    # adds a counter for the amount of accounts processed by the script
    accountcounter = accountcounter + 1
    print("_____________________________________________")
    print(TIMESTAMP, "Account", accountcounter)
    ... # etc

def generate_rows():
    """ generator function to yield rows """
    with open('outlookAccounts.csv', newline='') as f:
        dict_reader = DictReader(f)
        for row in dict_reader:
            yield row

NUM_THREADS = 100
worker = partial(IMAPDumper, "outlook.office365.com", "ALL")
pool = ThreadPool(NUM_THREADS)
for return_value in pool.imap_unordered(worker, generate_rows()):
    # must iterate the iterator returned by imap_unordered to ensure all tasks are run and completed
    pass # return values are None
Answered By: Booboo