multiprocessing ImportError: No module named <input>

Question:

I am using a Windows machine and I have a code designed for Python 2.7 that solves an statistical model. Since the model depends on the value of a parameter, I created a parallelized version that solves one model for each value of the parameter.

Consider for instance a first file called main_function that includes the following code (this code is here for the sake of replicability but is not question-related):

import numpy as np
import cvxpy

def lm_lasso(x, y, lambda1=None):
    n = x.shape[0]
    m = x.shape[1]
    lambda_param = cvxpy.Parameter(sign="positive")
    # Define the objective function
    beta_var = cvxpy.Variable(m)
    lasso_penalization = lambda_param * cvxpy.norm(beta_var, 1)
    lm_penalization = (1.0 / n) * cvxpy.sum_squares(y - x * beta_var)
    objective = cvxpy.Minimize(lm_penalization + lasso_penalization)
    problem = cvxpy.Problem(objective)
    beta_sol_list = []
    for l in lambda1:
        lambda_param.value = l
        problem.solve(solver=cvxpy.ECOS)
        beta_sol = np.asarray(np.row_stack([b.value for b in beta_var])).flatten()
        beta_sol_list.append(beta_sol)
    return beta_sol_list

And a second file called parallel_function that includes the following code:

import multiprocessing as mp
import numpy as np
import functools
import zz_main_function as mf

def lm_lasso_parallel(x, y, lambda1):
    chunks = np.array_split(lambda1, mp.cpu_count())
    pool = mp.Pool(processes=mp.cpu_count())
    results = pool.map(functools.partial(mf.lm_lasso, x, y), chunks)
    pool.close()
    pool.join()
    return results

The reason why I splitted the functions into two files is because this way everything seemed to work without adding the usual if __name__ == '__main__': required when dealing with multiprocessing.

This code was written some months ago and worked perfectly either from the python console or by runnig a python file like:

import zz_parallel_function as pf
from sklearn.datasets import load_boston

boston = load_boston()
x = boston.data
y = boston.target
lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]

r_parallel = pf.lm_lasso_parallel(x, y, lambda1)

Recently I had to format my computer and when I reinstalled python 2.7 and trried to run the code described before, I run into the following errors:

  1. If I try to run it directly from python console:

    import zz_parallel_function as pf
    from sklearn.datasets import load_boston
    
    boston = load_boston()
    x = boston.data
    y = boston.target
    lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]
    
    r_parallel = pf.lm_lasso_parallel(x, y, lambda1)
    

enter image description here

  1. If I run it as an independent file:

enter image description here

So my question is:

  1. Why did this code work before and not now? The only thing that (possibly) changed is the version of some of the modules installed but I dont think this is that relevant

  2. Any guess on how to get it working again?

EDIT 1

By adding if __name__ == '__main__': to the code and running it as an independent file, it executes with no problem. However, when I try to execute it in a python console, it offers the same error as before.

Based on the comments received, this was possibly due to the necessity of frozing the code. The code in the python console is not frozen and this would be the cause of the issue. I then considered running the following example from multiprocessing for windows

from multiprocessing import Process, freeze_support

def foo():
    print 'hello'

if __name__ == '__main__':
    freeze_support()
    p = Process(target=foo)
    p.start()

This code suposedly freezes the code, but when running it in the python console, I get the same error as before.enter image description here

Answers:

You cannot spawn new child process(es) using mulitprocessing directly from the python interpreter.

From the docs,

Note: Functionality within this package requires that the main
module be importable by the children. This is covered in Programming
guidelines however it is worth pointing out here. This means that some
examples, such as the Pool examples will not work in the interactive
interpreter.

And the guideline says that

Safe importing of main module

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a
starting a new process).

Calling freeze_support() has no effect when invoked on any operating
system other than Windows. In addition, if the module is being run
normally by the Python interpreter on Windows (the program has not
been frozen), then freeze_support() has no effect.

Also, One should protect the “entry point” of the program by using if __name__ == '__main__': as follows:

from multiprocessing import Process, freeze_support

def f():
    print 'hello world!'

if __name__ == '__main__':
    freeze_support()
    Process(target=f).start()

If the freeze_support() line is omitted then trying to run the frozen executable(e.g. created using pyinstaller or py2exe) will raise RuntimeError.

Answered By: han solo

ModuleNotFoundError: No module named 'input'

The above happens when I use import input as well.