Astroscrappy not working on multiprocessing Jupyter

Question:

I have some multiprocessing code here that tries to run multiple astroscrappy processes at once. However, everything stops when it actually has to call astroscrappy. I am running this in a jupyter notebook.

def a_test(i, q):
    import astroscrappy
    print(1)
    path = i
    s = fits.getdata(path)
    print(2)

    print(2.5)
    a = astroscrappy.detect_cosmics(s)
    print(3)
    q.put([a, i])
bundle = []
import multiprocessing as mp
queue = mp.Manager().Queue()

processes = [] 
for k, item in enumerate(paths):
    processes.append(mp.Process(target=a_test, args=(item, queue)))
    
# Run processes
for p in processes:
    p.start()
for p in processes:
    bundle.append(queue.get())

It will only print out 1, 2, 2.5, but not 3 which comes after calling astroscrappy. Any ideas why it won’t work?

Asked By: James Huang

||

Answers:

The code is spawning multiple processes and each process must be given proper time to terminate. Calling join() for each process does exactly that. I tested the code below for 3 files and was able to observe concurrency of processes. The execution time is 7-9 seconds on my machine.

import time
import astroscrappy
from astropy.io import fits
import multiprocessing as mp

bundle = []
processes = []
paths = [r"C:UsersxxxxFITSsample.fits", r"C:UsersxxxxFITSsample1.fits", r"C:UsersxxxxFITSsample2.fits"]

def a_test(i,q):    
    print(1)
    path = i
    s = fits.getdata(path)
    print(2)

    print(2.5)
    a = astroscrappy.detect_cosmics(s)
    print(3)    
    q.put([a, path])

if __name__ == '__main__':  
    start = time.time()          
    queue = mp.Manager().Queue()    
    for item in paths:
        processes.append(mp.Process(target=a_test, args=(item,queue)))

    # # Run processes
    for p in processes:
        p.start()
    for p in processes:        
        bundle.append(queue.get())

    #wait for child processes to finish
    for p in processes:      
        p.join()   
    
    end = time.time()

    print(f'Execution time: {end - start} seconds')

Output:

1
2
2.5
1
2
2.5
1
2
2.5
3
3
3
Execution time: 9.224327564239502 seconds

This code does not run in jupyter notebook out of the box. To read why multiprocessing does not work as expected in jupyter notebook, you can refer to this discussion.

But there is a workaround. To make this code work in Jupyter notebook, you need to invoke a_test from a separate python file. For example, I created a python file(in the same directory where your notebook is running) called functest.py with the following code:

import astroscrappy
from astropy.io import fits

def a_test(*args):        
    path = args[0]   
    s = fits.getdata(path)
    a = astroscrappy.detect_cosmics(s)
    return [a,path]

Now, run the code below in your notebook. Note that, I’ve used a Pool instead of Process and the output will not have print statements from a_test such as 1,2,2,5 etc. I have removed them from a_test deliberately because they will not print to the Jupyter notebook output. Instead, I’ve printed out bundle to verify the processing.

import time
import multiprocessing as mp
import functest as f

paths = [r"C:UsersxxxxFITSsample.fits", r"C:UsersxxxxFITSsample1.fits", r"C:UsersxxxxFITSsample2.fits"]

def main():
    t1 = time.time()
    pool_size = len(paths)
    with mp.Pool(processes=pool_size) as pool:
        bundle = pool.map(f.a_test, [item for item in paths]) 
    print(bundle)
    print(f"Execution time: {time.time() - t1} seconds")

if __name__ == '__main__':
    main()

Output:

[[(array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]]), array([[4.8158903, 4.808729 , 4.7792015, ..., 4.6927767, 4.7188225,
        4.706318 ],
       [4.765932 , 4.75....
Execution time: 8.333278179168701

Another alternative to multiprocessing is the concurrent.futures module which runs in jupyter notebook without any issues. The below code can be run in jupyter notebook. I was able to bring down the execution times to 5-6 seconds.

import time
import astroscrappy
from astropy.io import fits
from concurrent.futures import ThreadPoolExecutor

paths = [r"C:UsersxxxxFITSsample.fits", r"C:UsersxxxxFITSsample1.fits", r"C:UsersxxxxFITSsample2.fits"]

def a_test1(*args):    
    print(1)
    path = args[0]
    s = fits.getdata(path)
    print(2)
    print(2.5)
    a = astroscrappy.detect_cosmics(s)
    print(3)
    return [a, path]


def main():
    t1 = time.time()
    n_threads = len(paths)    
    with ThreadPoolExecutor(n_threads) as executor:        
        futures = [executor.submit(a_test1, item) for item in paths]
        bundle = [future.result() for future in futures]
    print(bundle)  
    print(f"Execution time: {time.time() - t1}")

if __name__ == '__main__':  
    main()

Output:

1
1
1
2
2.5
2
2.5
2
2.5
3
3
3
[[(array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]]), array([[4.8158903, 4.808729 , 4.7792015, ..., 4.6927767, 4.7188225,
        4.706318 ]....
Execution time: 5.936185836791992

Another option is to use the threading module. Here’s a simple example which can be run in jupyter. I get execution times in the order of 5-6 seconds.

import time
import astroscrappy
from astropy.io import fits
from threading import Thread

paths = [r"C:UsersxxxxFITSsample.fits", r"C:UsersxxxxFITSsample1.fits", r"C:UsersxxxxFITSsample2.fits"]
thread_objs = []

def a_test(i):    
    print(1)
    path = i   
    s = fits.getdata(path)
    print(2)
    print(2.5)
    a = astroscrappy.detect_cosmics(s)
    print(3)    


def main():
    t1 = time.time()
   
    for item in paths:
        thread_objs.append(Thread(target=a_test, args=(item,)))

    # run each thread
    for thread in thread_objs:
        thread.start()
        
    # wait for each thread to finish
    for thread in thread_objs:
        thread.join() 
    print(f"Execution time: {time.time() - t1}")

main()

Output:

1
1
1
2
2.5
2
2.5
2
2.5
3
3
3
Execution time: 6.320343971252441

Note that if you don’t have a requirement to process the result of a_test outside of that function, then you don’t need to return anything from it. This would save further time.

I also ran some tests with joblib. I’ve refactored your code and shared some test results below:

from joblib import Parallel, delayed
import astroscrappy
from astropy.io import fits

paths = [r"C:UsersxxxxxFITSsample.fits", r"C:UsersxxxxxFITSsample1.fits", r"C:UsersxxxxFITSsample2.fits"]

def a_test(i):    
    print(1)
    path = i   
    s = fits.getdata(path)
    print(2)
    print(2.5)
    a = astroscrappy.detect_cosmics(s)
    print(3) 
    return ([a,i])
    
def main():
    t1 = time.time()
    a = Parallel(n_jobs=len(paths))(delayed(a_test)(i) for i in paths)    
    print(f"Execution time: {time.time() - t1}")

main()

Output:

Execution time: 6.360047101974487

The print statement outputs didn’t appear in jupyter notebook output, but the execution time was in the range of 6-7 seconds.

In conclusion, I didn’t observe significant time difference in execution times for any of the methods. This could also be because I tried on a small dataset(just 3 files). However, concurrent.futures showed slightly better results consistently. You can try all these methods and compare which one works best for your use-case.

Answered By: amanb

Using joblib’s Parallel, I was able to make this code work much faster without getting stuck.

def a_test_parallel(i):
    import astroscrappy
    print(1)
    path = i
    s = fits.getdata(path)
    print(2)

    print(2.5)
    a = astroscrappy.detect_cosmics(s)
    print(3)

    return ([a,i])

a = Parallel(n_jobs=15)(delayed(a_test_parallel)(i) for i in paths[:100])

I ran it a couple of times in comparison with code that doesn’t use parallel. This runs almost twice as fast. Still not sure why multiprocessing doesn’t work, but at least this does.

Edit: After running it on a larger dataset with some extra adjustments, the code here doesn’t actually run as fast as twice the speed. It actually is a bit slower than the synchronous code.

Answered By: James Huang

If you look at astroscrappy’s github here: https://github.com/astropy/astroscrappy, you will see that it already uses its own multiprocessing called OpenMP. Trying to further multiprocess this kind of function will yield marginal results or not even anything at all. Your best bet is to just stick with the speed you already have, which should be fairly quick.

Answered By: John Henry 5