Using VSCode in Slurm
Table of Contents
Although I don’t use Visual Studio Code[1] as a code editor (I use Neovim[2] 🦸), many colleagues use it as their primary editor.
(The solution presented here also works for PyCharm[4]!)
The problem
To develop in our Slurm[3] cluster, the users connect the VSCode to the “Login node” which starts the vscode-server
process.
This is not a problem by itself, but when many users do it simultaneously, it starts to consume many resources in a machine that should only be used to manage Slurm jobs.
Additionally, this setup only allows for editing the code, not for executing (or at least I hope they are not running code in the “Login node” 🤞) or even debugging.
Advantages
In the following sections, I will explain how to run VSCode inside Slurm[3].
This solves the problem of adding load to the login node. Additionally, since the code editor process will live inside a Slurm job, we will be able to debug and run code directly in the job. That’s very convenient!
🧑💻 The easy solution: VSCode on the web!
This solution is super simple to set up! It consists of running code-server
[5] as a Slurm job and accessing it through the web browser.
First, we need to install the binary in our system. There are many options for that, so just pick up the most convenient for you. See the list of options here. In my case, I choose the Standalone release
and put the binary in my path.
Second, we will create a job file, for example code-server.job
, with the following content:
#!/bin/bash
#SBATCH --job-name=code-server
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j_%N.log
#SBATCH --mem=8gb
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH -p submit
PASSWORD=1234 # TODO: Change to secure password
PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
echo "********************************************************************"
echo "Starting code-server in Slurm"
echo "Environment information:"
echo "Date:" $(date)
echo "Allocated node:" $(hostname)
echo "Node IP:" $(ip a | grep 131.159)
echo "Path:" $(pwd)
echo "Password to access VSCode:" $PASSWORD
echo "Listening on:" $PORT
echo "********************************************************************"
PASSWORD=$PASSWORD code-server --bind-addr 0.0.0.0:$PORT --auth password --disable-telemetry
❗ Remember to change the password! ❗
Finally, once the job has started, we can open the editor using any web browser and navigating to the IP
of the node and the PORT
.
That’s it!
If the node is not accessible from the Internet, use port-forwarding https://coder.com/docs/code-server/latest/guide#port-forwarding-via-ssh.
👷 The complex solution: Start your own sshd
process
If you are up for a more complex solution or use other IDEs like PyCharm, you can use the next configuration. It involves starting sshd
in a Slurm job and then connecting our IDEs to the new process (instead of the global sshd
process).
Step 1: Create the SSH keys
ssh-keygen -t rsa -f .ssh/vcg_cluster_user_sshd
Then, copy the generated keys to the login node. (We assume here that the user’s home directories are accessible on all nodes).
Step 2: sshd
Slurm job
Copy the following content to a new file such as sshd.job
.
#!/bin/bash
#SBATCH --job-name=sshd
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j_%N.log
#SBATCH --mem=8gb
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH -p submit
PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
echo "********************************************************************"
echo "Starting sshd in Slurm as user"
echo "Environment information:"
echo "Date:" $(date)
echo "Allocated node:" $(hostname)
echo "Node IP:" $(ip a | grep 131.159)
echo "Path:" $(pwd)
echo "Listening on:" $PORT
echo "********************************************************************"
/usr/sbin/sshd -D -p ${PORT} -f /dev/null -h ${HOME}/.ssh/vcg_cluster_user_sshd
Step 3: Test the connection
At this point, you should be able to connect using ssh to the Slurm job.
ssh user@node -p <PORT where the server started> -i ~/.ssh/vcg_cluster_user_sshd
Notice that the ssh session can only see the resources allocated to the job (for example the gpus).
Step 4: Connect your IDE
Finally, use your IDEs “Remote Connection” feature to connect to the job.
Step 5: Remember to end the sshd
process
It is important to cancel the Slurm job when we don’t need the sshd
listening anymore.
📑 References
[1]: https://code.visualstudio.com/
[2]: https://www.lunarvim.org/
[3]: https://slurm.schedmd.com/
[4]: https://www.jetbrains.com/help/pycharm/getting-started.html