python multiprocessing dataframe rows

Question

  def main():     
    df_master = read_bb_csv(file)
                p = Pool(2)
                if len(df_master.index) >= 1:
                    for row in df_master.itertuples(index=True, name='Pandas'):
                         p.map((partial(check_option, arg1=row), df_master))
    
    
    def check_option(row):
       get_price(row)

I am using Pandas to read a CSV, to loop through rows and process the informaiton. Give the function get_price() need to make serveral http calls, I want to use multiprocess to process all rows at once (depends on the CPU cores) to speed up the function.

The issue I am having is that, I am new to multi process and don’t know how to use
p.map((check_option, arg1=row), df_master) to process all rows in the dataframe.
There is no need to return the row value back to the function. Just need to allow to rows to processes to handle.

Thank you for your help.

Asked By: user12587828

||

Source

Answer 1

You can use the following python3 version that I use everywhere and it works like a charm! There’s also a python3 package mpire which I found really useful and usage is similar to that of python3’s multiprocessing package.

from multiprocessing import Pool
import pandas as pd

def get_price(idx, row):
    # logic to fetch price
    return idx, price

def main():
    df = pd.read_csv("path to file")
    NUM_OF_WORKERS = 2 
    with Pool(NUM_OF_WORKERS) as pool:
        results = [pool.apply_async(get_price, [idx, row]) for idx, row in df.iterrows()]
        for result in results:
            idx, price = result.get()
            df.loc[idx, 'Price'] = price
    # do whatever you want to do with df, save it to same file.

if __name__ == "__main__":
    # don't forget to call main func as module
    # This is must in windows use multiple processes/threads. It's also a good practice, more info on this page https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming
    main()

Answered By: Harish Mashetty

python multiprocessing dataframe rows

Question:

Answers: