Slurm and Temporary Files¶

This section describes how Slurm handles temporary files on the local disk.

Temporary Files Best Practices

See Best Practices: Temporary Files for information how to use temporary files effectively.

Slurm Behaviour¶

Our Slurm configuration has the following behaviour.

Environment Variable TMPDIR¶

Slurm itself will by default not change the TMPDIR environment variable but retain the variable's value from the srun or sbatch call.

Private Local `/tmp` Directories¶

The only place where users can write data to on local storage of the compute nodes is /tmp.

Storage is a consumable shared resource as the storage used by one job cannot use another job. It is thus critical that Slurm cleans up after each job such that all space on the local node is available to the next job. This is done using the job_container/tmpfs Slurm plugin.

This plugin creates a so-called Linux namespace for each job and creates a bind mount of /tmp to a location on the local storage. This mount is only visible to the currently running job and each job, even of the same user, get their own /tmp. After a job terminates, Slurm will remove the directory and all of its content.

There is a notable exception. If you use ssh to connect to a node rather than using srun or sbatch, you will see the system /tmp directory and can also write to it. This usage of storage is not tracked and consequently you can circumvent the Slurm quota management. Using /tmp in this fashion (i.e., outside of Slurm-controlled jobs) is prohibited. If it cannot be helped (e.g., if you need to run some debugging application that needs to create FIFO or socket files) then keep usage of /tmp outside of Slurm job below 100MB.

Tracking Local Storage `localtmp`¶

Enforcing localtmp Gres

From January 31, we will enforce the allocated storage in /tmp on the local disk with quotas. Jobs writing to /tmp beyond the quota in the job allocation will not function properly and probably crash with "out of disk quota" messages.

Slurm tracks the available local storage above 100MB on nodes in the localtmp generic resource (aka Gres). The resource is counted in steps of 1MB, such that a node with 350GB of local storage would look as follows in scontrol show node:

hpc-login-1 # scontrol show node hpc-cpu-1
NodeName=hpc-cpu-1 Arch=x86_64 CoresPerSocket=24
   [...]
   Gres=localtmp:350K
   [...]
   CfgTRES=cpu=96,mem=360000M,billing=96,gres/localtmp=358400
   [...]

Each job is automaticaly granted 100MB of storage on the local disk which is sufficient for most standard programs. If your job needs more temporary storage then you should either

use the $HOME/scratch volume (see Best Practices: Temporary Files)
specify a localtmp generic resource (described here)

You can allocate the resource with --gres=localtmp:SIZE where SIZE is given in MB.

hpc-login-1 # srun --gres=localtmp:100k --pty bash -i
hpc-cpu-1 # scontrol show node hpc-cpu-1
NodeName=hpc-cpu-1 Arch=x86_64 CoresPerSocket=24
   [...]
   Gres=localtmp:250K
   [...]
   CfgTRES=cpu=96,mem=360000M,billing=96,gres/localtmp=358400
   [...]
   AllocTRES=cpu=92,mem=351G,gres/localtmp=102400
   [...]

The first output tells us about the resource configured to be available to user jobs and the last line show us that 100k=102400 MB of local storage are allocated.

You can also see the used resources in the details of your job:

scontrol show job 14848
JobId=14848 JobName=example.sh
   [...]
   TresPerNode=gres:localtmp:100k