Multiprocessing handling of files in python

Question:

I am referring to this answer in order to handle multiple files at once using multiprocessing but it stalls and doesn’t work

That is my try:

import multiprocessing
import glob
import json

def handle_json(file):
    with open(file, 'r', encoding = 'utf-8') as inp, open(file.replace('.json','.txt'), 'a', encoding = 'utf-8', newline = '') as out:
        length = json.load(inp).get('len','') #Note: each json file is not large and well formed
        out.write(f'{file}t{length}n')

p = multiprocessing.Pool(4)
for f, file in enumerate(glob.glob("Folder\*.json")):
    p.apply_async(handle_json, file)
    print(f)

p.close()
p.join() # Wait for all child processes to close.

Where is the problem exactly, I thought it may be because I have 3000 json files so I copied just 50 into another folder and tried with them but also the same problem

ADDED:
Debug with VS Code

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: <module>)

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
  File "C:UsersadminDesktopF_Newstacko.py", line 10, in <module>
    p = multiprocessing.Pool(4)
  File "<string>", line 1, in <module> (Current frame)

Another ADD
Here a zip file contains the sample file with the code
https://drive.google.com/file/d/1fulHddGI5Ji5DC1Xe6Lq0wUeMk7-_J5f/view?usp=share_link

Task Manager

Asked By: Khaled

||

Answers:

The apply_async function in multiprocessing expects the arguments to the called function to be iterable, so you need to do e.g.:

p.apply_async(handle_json, [file])
Answered By: match

on windows you have to put your multiprocessing code guarded by an if __name__ == "__main__":, Compulsory usage of if name=="main" in windows while using multiprocessing [duplicate]

you also need to use get on the tasks that you launched with apply_async, in order to wait for them to finish, so you should store them in a list and iterate the get on them.

after fixing, your code would look as follows:

import multiprocessing
import glob
import json

def handle_json(file):
    with open(file, 'r', encoding = 'utf-8') as inp, open(file.replace('.json','.txt'), 'a', encoding = 'utf-8', newline = '') as out:
        length = json.load(inp).get('len','') #Note: each json file is not large and well formed
        out.write(f'{file}t{length}n')

if __name__ == "__main__":
    p = multiprocessing.Pool(4)
    tasks = []
    for f, file in enumerate(glob.glob("Folder\*.json")):
        task = p.apply_async(handle_json, [file])
        tasks.append(task)
        print(f)

    for task in tasks:
        task.get()
    p.close()
    p.join() # Wait for all child processes to close.
Answered By: Ahmed AEK
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.