Sometimes it can be a daunting task to get all the Kerberos and SSH configurations right on your first attempt at using the PDC systems. Nearly every day PDC Support receives a number of help requests and questions from researchers who have run into configuration problems. So PDC has introduced an alternative way of logging in to our clusters by using Docker containers with pre-configured Kerberos and SSH files.
What is Docker?
Docker is a tool used to deploy and run single or many applications within what are known as containers. Containers are packed with all the parts that are needed to run an application, such as the actual application, any relevant tools, libraries or other necessary information. Each Docker container is delivered as a single package with all the necessary material included. This means that the person using the Docker container does not have to worry about installing any new tools/libraries or configuring them before using them. Since everything is preloaded, the applications inside the containers are ready-to-use and can be executed regardless of any customized settings on the host machine.
Later in this blog article, we will talk about a Docker container that we have developed for the purpose of logging in to PDC clusters.
Why not use a Virtual Machine (VM)?
In a way, Docker is a like a Virtual Machine (VM), but the difference is that instead of installing a whole new operating system, each Docker container only contains the parts of the operating system that are required by the applications in that particular container. Thus, unlike VMs, which each have an entire guest operating system, each Docker container runs as an isolated process in user space on the host operating system. This saves a lot of memory and makes it easier to use a container rather than a new operating system. Thus, Docker containers enjoy the resource isolation and allocation benefits of VMs but are much more portable and efficient.
Here we are focusing on using Docker to log in to PDC supercomputers, which is a relatively small task. Hence, we use the lighter Docker containers instead of actual VMs.
How to get started with Docker?
For logging in to our clusters here at PDC, we have created our own Docker container. To use it, you need to have Docker Toolbox installed on your system. You can find the instructions for installing Docker Toolbox on the operating system that you are currently using via the following links.
When you have installed Docker Toolbox on your system, if you are using Windows or a Mac, you will find that you have a new terminal application, called Docker Terminal, installed on your system. You can then use Docker Terminal for loading any Docker containers that you want to use. If you have a Ubuntu system, you can simply use Docker commands in your regular terminal. The docker installation links shown above for Windows and Mac are for the Docker toolbox versions only – these are lighter legacy versions of Docker CE. If you do not mind the huge size of the actual Docker CE software (~500MB), or if you will be performing more advanced tasks, you can find the links for installing the full Docker CE software in the ‘Get Docker’ section of https://docs.docker.com.
A hub for the containers
Docker also provides a personal storage place for containers known as Docker Hub. It is a cloud-based registry service which allows you to link to code repositories, build your own images, test them and store manually pushed images. In fact, we have created one container in our PDC repository on Docker Hub that we will be using in the upcoming steps.
Alright, can we login now?
Once you have installed Docker in your system, you can simply open the Docker Terminal on Windows or Mac, or your regular terminal on Ubuntu, and enter the command below to launch the container that we have created for you. Note: Make sure your system is connected to the Internet before you do this.
docker run -it pdcsupport/pdclogin:latest
-it flag in the command means that you want to run the Docker application interactively. If that flag is not used, the process will run in the background and has to be accessed separately later.
pdcsupport is our Docker username,
pdclogin is the name of a repository and
latest is the tag name given to our latest container in that repository.
Once you have successfully loaded the PDC container, you can log in to any of the PDC host machines using two simple commands, as shown in the following example. You will need to give your username and the cluster you want to log in to.
kinit -f <username>@NADA.KTH.SE ssh <username>@<hostname>.pdc.kth.se
Your screen output should look similar to the following image.
Now you should be logged in and able to use our machines. Isn’t that easy!
kinit is a Kerberos command used to create a ticket which is stored locally in your system. It is then forwarded to the host machine when you try logging in using the
ssh command. If you want to learn more about how
ssh work at PDC, you can find some information here.
File sharing between Docker and host machines
Sometimes you will need to transfer files from your local system to the PDC host machine. To do this with Docker, you need to map a folder in your local system to a folder inside the PDC container. To do this use the
-v flag when launching the PDC container.
docker run -it \ -v <path-to-the-local-folder>:/root/<new-folder> \ pdcsupport/pdclogin:latest
Note: Use the full path to your local folder rather than the relative path.
You can now see your files inside the PDC Docker container in the path
root/<new-folder>. Any changes that are made to the folder in the local system will be reflected inside the
<new-folder> in the Docker container and vice versa.
Now that your container has access to your data, meaning the files inside the specified folder, you can transfer data between your local system and the PDC host machines. To copy any file from the Docker container to the PDC host machine, you can use the
scp command, which uses the ssh protocol for secure transfers.
# Copy a single file from local system to PDC host machine scp root/<fileToCopy> \ <username>@t04n28.pdc.kth.se:/afs/pdc.kth.se/home/u/user/
Please note the highlighted section of the command above. The path to your home folder in PDC’s AFS file system is actually
/afs/pdc.kth.se/home/<first-letter-of-your-username>/<your-username> however, in this article, we will just write
/u/user/ for the last two parts of the path to make things easier to read. You can find out more about file systems and data management at PDC here.
To copy entire folders you can use the
-r flag as shown below.
#Copy a directory from local system to PDC host machine scp -r root/<dirToCopy> \ <username>@t04n28.pdc.kth.se:/afs/pdc.kth.se/home/u/user/<dir>
Bear in mind that you need to be in your Docker container, not on the PDC host machines, when using these file transfer commands. The general command usage is
scp -r <source> <destination>. If you want to copy the files the other way around, that is, from a PDC cluster to your system, you still need to be in the Docker container and use the command in the following way.
#Copy a directory from PDC host machine to local system scp -r <username>@t04n28.pdc.kth.se:/afs/pdc.kth.se/home/u/user/<dir> \ root/<dir>
Some things to remember
- These procedures should be considered as an intermediate platform to make it easier for you to log in, and transfer files, from your local computer to a PDC system.
- The Docker container used at PDC is based on a lightweight version of Ubuntu 16.04 which only has basic built-in commands. You can however download our Docker image, and install other tools that you need in it.
If you would like to know more about Docker and how to use it, follow this link.
If you encounter any difficulties when logging in during any of the above steps, please check our troubleshooting page here.