Temporary Files and Slurm
See Slurm: Temporary Files for information how Slurm controls access to local temporary storage.
Often, it is necessary to use temporary files, i.e., write something out in the middle of your program, read it in again later, and then discard these files.
samtools sort has to write out chunks of sorted read alignments for allowing to sort files larger than main memory.
Traditionally, in Unix, the environment variables
TMPDIR is used for storing the location of the temporary directory.
When undefined, usually
/tmp is used.
Temporary Directories on the BIH Cluster¶
Generally, there are two locations where you could put temporary files:
/fast/users/$USER/scratch/tmp-- inside your scratch folder on the fast GPFS file system; this location is available from all cluster nodes
/tmp-- on the local node's temporary folder; this location is only available on the node itself. The slurm scheduler uses Linux namespaces such that every job gets its private
/tmpeven when run on the same node.
Best Practice: Use
Use GPFS-based TMPDIR
Generally setup your environment to use
/fast/users/$USER/scratch/tmp as filling the local disk of a node with forgotten files can cause a lot of problems.
Ideally, you append the following to your
~/.bashrc to use
/fast/users/$USER/scratch/tmp as the temporary directory.
This will also create the directory if it does not exist.
Further, it will create one directory per host name which prevents too many entries in the temporary directory.
export TMPDIR=$HOME/scratch/tmp/$(hostname) mkdir -p $TMPDIR
Prepending this to your job scripts is also recommended as it will ensure that the temporary directory exists.
TMPDIR and the scheduler¶
In the older nodes, the local disk is a relatively slow spinning disk, in the newer nodes, the local disk is a relatively fast SSD.
Further, the local disk is independent from the GPFS file system, so I/O volume to it does not affect the network or any other job on other nodes.
Please note that by default, Slurm will not change your environment variables.
This includes the environment variable
Slurm will automatically update temporary files in a job's
/tmp on the local file system when the job terminates.
To automatically clean up temporary directories on the shared file system, use the following tip.
Use Bash Traps¶
You can use the following code at the top of your job script to set
TMPDIR to the location in your home directory and get the directory automatically cleaned when the job is done (regardless of successful or erroneous completion):
# First, point TMPDIR to the scratch in your home as mktemp will use thi export TMPDIR=$HOME/scratch/tmp # Second, create another unique temporary directory within this directory export TMPDIR=$(mktemp -d) # Finally, setup the cleanup trap trap "rm -rf $TMPDIR" EXIT