Task Scheduling Across a Network?

Question:

Can you recommend on a python tool / module that allows scheduling tasks on remote machine in a network?

Note that the solution must be able to not only run certain jobs/commands on remote machines, but also verify that jobs etc are still running (for example, consider the case where a machine dies after a task has been assigned to it?)

Asked By: user3262424

||

Answers:

Should be able to use Python WMI, for *NIX based systems it’s a wrap around SSH and CRON.

Answered By: Daniel Protopopov

RPyC or Remote Python Call, is a transparent and symmetrical python library for remote procedure calls, clustering and distributed-computing. Here an example from Wikipedia:

import rpyc
conn = rpyc.classic.connect("hostname")  # assuming a classic server is running on 'hostname'

print conn.modules.sys.path
conn.modules.sys.path.append("lucy")
print conn.modules.sys.path[-1]

# a version of 'ls' that runs remotely
def remote_ls(path):
    ros = conn.modules.os
    for filename in ros.listdir(path):
        stats = ros.stat(ros.path.join(path, filename))
        print "%dt%dt%s" % (stats.st_size, stats.st_uid, filename)

remote_ls("/usr/bin")

# and exceptions...
try:
     f = conn.builtin.open("/non/existent/file/name")
except IOError:
     pass

To check if the remote server has died after assigning it a job, you can use the ping method of the Connection class. The complete API is described here.

Answered By: Ralph

Fabric (http://docs.fabfile.org/en/1.0.1/index.html) is a pretty good toolkit for various sys admin and deployment tasks. It comes with a few pre defined tasks but also gives you the flexibility to add what you need.

I highly recommend it.

Answered By: Tim O

I developed a tool which can be used for scheduling jobs across multiple computers:
https://pypi.org/project/pyremto/

The pipeline is explained in this example:
https://github.com/MatthiasKi/pyremto/tree/master/examples/JobScheduling/GeneralJobScheduling

You first set up your jobs (as python dicts with instructions for your workers) and send them to the server (then you can shut down the client which you used to set up the jobs). After setting up the jobs, you can ask for new jobs from your workers until all jobs are done (you can watch the progress of the job execution in the pyremto app).

Regarding your concern about machines dying: You can specify the "max_redistribution_attempts" variable when setting up the jobs. By that, jobs are redistributed if all other jobs are already done and the server is waiting only for certain jobs results. So you do not have to worry about dying machines.

Answered By: Matthias Kissel