Job management

CS 300 (PDC)

Cluster job management

Example: some SLURM commands

sbatch -n count -o outfile script
Submit a script for batch computation. For example, if myscript.sh contains
#!/bin/sh
mpirun trap.c
then the command
  % sbatch -n 4 -o myscript.out myscript.sh
submits a job to execute the MPI program trap.c with four processors, and to place the output in a file myscript.out. Any error output will go to the standard error; the -e flag can be used to place error output into a file.

Another command-line option:

You can specify command-line options within the batch file script in shell-script comments that begin with #SBATCH. For example, if myscript2.sh contains
#!/bin/sh
#SBATCH -n 4 -o myscript.out

mpirun trap.c
then the command
  % sbatch myscript2.sh
performs a computation equivalent to the sbatch command above. More than one #SBATCH line may be used, e.g.,
#!/bin/sh
#SBATCH -n 4 -o myscript.out
#SBATCH -e myscript.err

mpirun trap.c
squeue
View all running jobs being managed by SLURM

A command-line option:

sinfo
Display which nodes are idle, down, or allocated
scancel job-id
Cancel a job. Use squeue to find job-id values
scancel -u username will cancel all your jobs