Data Migration Tips and tricks¶
Please use hpc-transfer-1 and hpc-transfer-2 for moving large amounts of files.
This not only leaves the compute notes available for actual computation, but also has no risk of your jobs being killed by Slurm.
You should also use tmux to not risk connection loss during long running transfers.
Moving a project folder¶
-
Define source and target location and copy contents. Please replace the parts in curly brackets with your actual folder names. It is important to end paths with a trailing slash (
/) as this is interpreted bysyncas “all files in this folder”.$ SOURCE=/data/gpfs-1/work/projects/{my_project}/ $ TARGET=/data/cephfs-2/unmirrored/projects/{my-project}/ $ rsync -ahP --stats --dry-run $SOURCE $TARGET -
Remove the
--dry-runflag to start the actual copying process.Important
File ownership information will be lost during this process. This is due to non-root users not being allowed to change ownership of arbitrary files. If this is a problem for you, please contact our admins again after completing this step.
-
Perform a second
rsyncto check if all files were successfully transferred. Paranoid users might want to add the--checksumflag torsyncor usehashdeep. Please note the flag--remove-source-fileswhich will do exactly as the name suggests, but leaves empty directories behind.$ rsync -ahX --stats --remove-source-files --dry-run $SOURCE $TARGET - Again, remove the
--dry-runflag to start the actual deletion. - Check if all files are gone from the SOURCE folder and remove the empty directories:
$ find $SOURCE -type f | wc -l 0 $ rm -r $SOURCE
Warning
When defining your SOURCE location, do not use the * wildcard character.
It will not match hidden (dot) files and leave them behind.
Its better to use a trailing slash which matches “All files in this folder”.
Moving user work folders¶
Work data¶
-
All files within your own work directory can be transferred as follows. Please replace parts in curly braces with your cluster user name.
$ SOURCE=/data/gpfs-1/work/users/{username}/ $ TARGET=/data/cephfs-1/home/users/{username}/work/ $ rsync -ahP --stats --dry-run $SOURCE $TARGETNote
The
--dry-runflag lets you check that rsync is working as expected without copying any files. Remove it to start the actual transfer. -
Perform a second
rsyncto check if all files were successfully transferred. Paranoid users might want to add the--checksumsflag or usehashdeep. Please note the flag--remove-source-fileswhich will do exactly as the name suggests, but leaves empty directories behind.$ rsync -ahP --stats --remove-source-files --dry-run $SOURCE $TARGET - Check if all files are gone from the SOURCE folder:
$ find $SOURCE -type f | wc -l 0
Conda environments¶
Conda installations tend not to react well to moving their main folder from its original location. There are numerous ways around this problem which are described here.
A simple solution we can recommend is this:
-
Install a fresh version of conda or mamba in your new work folder. Don't forget to first remove the conda init block in
~/.bashrc.$ nano ~/.bashrc $ conda init $ conda config --set auto_activate_base false -
You can then use your new conda to export your old environments by specifying a full path like so:
If you run into errors it might be better to also use the$ conda env export -p /fast/work/user/$USER/miniconda/envs/<env_name> -f <env_name>.yaml--no-buildsflag. -
Finally re-create your old environments from the yaml files:
$ conda env create -f {environment.yml}