SLURM Array Job BASH scripting within python subprocess

Question

Update: I was able to get a variable assignment from SLURM_JOB_ID with this line. JOBID=`echo ${SLURM_JOB_ID}`
However, I haven’t yet gotten SLURM_ARRAY_JOB_ID to assign itself to JOBID.

Due to needing to support existing HPC workflows. I have a need to pass a bash script within a python subprocess. It was working great with openpbs, now I need to convert it to SLURM. I have it largely working in SLURM hosted on Ubuntu 20.04 except that the job array is not being populated. Below is a code snippet greatly stripped down to what’s relevant.

The specific question I have is. Why are the lines JOBID=${SLURM_JOB_ID} and JOBID=${SLURM_ARRAY_JOB_ID} are not getting their assignments? I’ve tried using a heredoc and various bashisms without success.

The code certainly can be cleaner, it’s the result of multiple people without a common standard.

These are relevant

Accessing task id for array jobs

Handling bash system variables and slurm environmental variables in a wrapper script

       sbatch_arguments = "#SBATCH --array=1-{}".format(get_instance_count())

       proc = Popen('ssh ${USER}@server_hostname /apps/workflows/slurm_wrapper.sh sbatch', shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
        job_string = """#!/bin/bash -x
        #SBATCH --job-name=%(name)s
        #SBATCH -t %(walltime)s
        #SBATCH --cpus-per-task %(processors)s
        #SBATCH --mem=%(memory)s
        %(sbatch_args)s

        # Assign JOBID
        if [ %(num_jobs)s -eq 1 ]; then
            JOBID=${SLURM_JOB_ID}
        else
            JOBID=${SLURM_ARRAY_JOB_ID}
        fi

        exit ${returnCode}

        """ % ({"walltime": walltime
                ,"processors": total_cores
                ,"binary": self.binary_name
                ,"name": ''.join(x for x in self.binary_name if x.isalnum())
                ,"memory": memory
                ,"num_jobs": self.get_instance_count()
                ,"sbatch_args": sbatch_arguments
                })

        # Send job_string to sbatch
        stdout, stderr = proc.communicate(input=job_string)

Asked By: Chase Schuette

||

Source

Answer 1

Following up on this. I sovled it by passing SBATCH directives as args to the sbatch command

    sbatch_args = """--job-name=%(name)s --time=%(walltime)s --partition=defq --cpus-per-task=%(processors)s --mem=%(memory)s""" % (
                    {"walltime": walltime
                    ,"processors": cores
                    ,"name": ''.join(x for x in self.binary_name if x.isalnum())
                    ,"memory": memory
                    })

    # Open a pipe to the sbatch command. {tee /home/ahs/schuec1/_stderr_slurmqueue | sbatch; }
    # The SLURM variables SLURM_ARRAY_* do not exist until after sbatch is called.
    # Popen.communicate has BASH interpret all variables at the same time the script is sent.
    # Because of that, the job array needs to be declared prior to the rest of the BASH script.

    # It seems further that all SBATCH directives are not being evaultated when passed via a string with .communicate
    # due to this, all SBATCH directives will be passed as arguments to the slurm_wrapper.sh as the first command to the Popen pipe.

    proc = Popen('ssh ${USER}@hostname /apps/workflows/slurm_wrapper.sh sbatch %s' % sbatch_args,
    shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE,
    close_fds=True,
    executable='/bin/bash')

Answered By: Chase Schuette

SLURM Array Job BASH scripting within python subprocess

Question:

Answers: