Snakemake with Slurm¶
This page describes how to use Snakemake with Slurm.
Prerequisites¶
- This assumes that you have Miniconda properly setup with Bioconda.
- Also it assumes that you have already activated the Miniconda base environment with
source miniconda/bin/activate
.
Environment Setup¶
We first create a new environment snakemake-slurm
and activate it.
We need the snakemake
package for this.
host:~$ conda create -y -n snakemake-slurm snakemake
[...]
#
# To activate this environment, use
#
# $ conda activate snakemake-slurm
#
# To deactivate an active environment, use
#
# $ conda deactivate
host:~$ conda activate snakemake-slurm
(snakemake-slurm) host:~$
Snakemake Workflow Setup¶
We create a workflow and ensure that it works properly with multi-threaded Snakemake (no cluster submission here!)
host:~$ mkdir -p snake-slurm
host:~$ cd snake-slurm
host:snake-slurm$ cat >Snakefile <<"EOF"
rule default:
input: "the-result.txt"
rule mkresult:
output: "the-result.txt"
shell: r"sleep 1m; touch the-result.txt"
EOF
host:snake-slurm$ snakemake --cores=1
[...]
host:snake-slurm$ ls
Snakefile the-result.txt
host:snake-slurm$ rm the-result.txt
Snakemake and Slurm¶
You have two options:
- Simply use
snakemake --profile=cubi-v1
and the Snakemake resource configuration as shown below. STRONGLY PREFERRED - Use the
snakemake --cluster='sbatch ...'
command.
Note that we sneaked in a sleep 1m
? In a second terminal session, we can see that the job has been submitted to SLURM indeed.
host:~$ squeue -u holtgrem_c
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
325 debug snakejob holtgrem R 0:47 1 med0127
Threads & Resources¶
The cubi-v1
profile (stored in /etc/xdg/snakemake/cubi-v1
on all cluster nodes) supports the following specification in your Snakemake rule:
threads
: the number of threads to execute the job on- memory in a syntax understood by Slurm, EITHER
resources.mem
/resources.mem_mb
: the memory to allocate for the whole job, ORresources.mem_per_thread
: the memory to allocate for each thread.
resources.time
: the running time of the rule, in a syntax supported by Slurm, e.g.HH:MM:SS
orD-HH:MM:SS
resources.partition
: the partition to submit your job into (Slurm will pick a fitting partition for you by default)resources.nodes
: the number of nodes to schedule your job on (defaults to1
and you will want to keep that value unless you want to use MPI)
You will need Snakemake >=7.0.2 for this.
Here is how to call Snakemake:
# snakemake --profile=cubi-v1 -j1
To set rule-specific resources:
rule myrule:
threads: 1
resources:
mem='8G',
time='04:00:00',
input: # ...
output: # ...
shell: # ...
You can combine this with Snakemake resource callables, of course:
def myrule_mem(wildcards, attempt):
mem = 2 * attempt
return '%dG' % mem
rule snps:
threads: 1
resources:
mem=myrule_mem,
time='04:00:00',
input: # ...
output: # ...
shell: # ...
Custom logging directory¶
By default, slurm will write log files into the working directory of snakemake, which will look like slurm-$jobid.out
.
To change this behaviour, the environment variable SBATCH_DEFAULTS
can be set to re-route the --output
parameter. If you want to write your files into slurm_logs
with a filename pattern of $name-$jobid
for instance, consider the following snippet for your submission script:
#!/bin/bash
#
#SBATCH --job-name=snakemake_main_job
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --time=48:10:00
#SBATCH --mem-per-cpu=300M
#SBATCH --output=slurm_logs/%x-%j.log
mkdir -p slurm_logs
export SBATCH_DEFAULTS=" --output=slurm_logs/%x-%j.log"
date
srun snakemake --use-conda -j1 --profile=cubi-v1
date
The name of the snakemake slurm job will be snakemake_main_job
, the name of the jobs spawned from it will be called after the rule name in the Snakefile.