Motivation
When our company started growing above 50–60 employees I left leading the science team to join development operations. I was a junior devops learning from deizel and we needed cloud based infrastructure to scale our company. As we were building the pipelines for deployment of our software on we noticed we needed to standardize code development for engineers and scientists.
Every time a new engineer, scientist, or any employee really joined they had to install the software and configure the machine and if they wanted to make edits it would be a nightmare for someone not familiar with tech. Our codebase was growing so it’s just too much for someone not full time to understand. This warranted us to move into cloud based development with VS code.
After I left the job and now joined academia, I realized as I have more ongoing collaborators on my projects from around the world I need to configure a cloud based infrastructure so we can do science faster, develop large scale code, and execute test builds with minimal cost. I don’t want them to go through the same thing I did and learn because it will take too much time.
Authentication
So here we go, the first is to establish a Github Organization for your lab, from here we can manage members of the group. Here is ours. If you follow their documentation it should be pretty intuitive. I’ve done this on both Azure and AWS and I found Github to actually be the most friendly in terms of UI design.
From here we can control who is allowed and who is not into the group. In the Codespaces from the settings we can see a permissions for selected users. This I preferred so I can control in more acute sense since we only have 20 people. If large scale I would configure the team authentication access more in the Github Organization settings.
Launch a Code space
In our organization we have 42 repositories and growing as more members contribute in. Each code space can actually be launched from the repository with the code deployed. As long as I told folk where to click it was a pretty seamless process in teaching 3–4 co-workers at once.
When the codespace launches you should see a container started with your server (a starter is usually 2-cores and 4–8GB of RAM).
This launches the server, and bam a familiar visual for folk that use VScode. My code is all there and the last instances of stuff that was run.
So now we have the VS Code editor with my code deployed. The server is running Ubuntu 20.04 so I had to configure it a little to get stuff working (it was missing a libffl requirement), when I have a new server I usually run this as my default installer.
#!/bin/bash# Install Ubuntu 20.04 libffl requirements
# ----------------------------------------curl -LO http://archive.ubuntu.com/ubuntu/pool/main/libf/libffi/libffi6_3.2.1-8_amd64.deb
sudo dpkg -i libffi6_3.2.1-8_amd64.deb# Install Psi4
# ------------conda create -n psi4_env python=3.9
conda install -n psi4_env -c psi4 psi4=1.5
And this is usually my favourite part. Watching stuff install…..
So after the installation depending on how you implemented the conda path stuff. This can be tricky on these machines and deserve another blog post.
/bin/bash
source activate psi4_env
And there we go! Since our group does a lot of Quantum Mechanics we often have to have psi4 installed as its become our default. So for each new academic member that joins the project doesn’t have to worry about the installations and can move to the science faster.
Determine the Machine Type & Financial Costs
So with AWS, I blew through their free credit of like $800–1000 dollars rapidly because of the amount of standup I had to configure before I can actually submit a job for my machine learning pipelines. Neptune and the GraphDB was expensive. So I wondered about Github Codespaces.
The 16-core with 32GB of RAM blew through 10 bucks in a day if that was constantly running. For more software development, a 16-core machine is not needed until we start to scale to more large data. I have found a 4- or 8- core machine works best and I set my budget to $30.00
This is one of the most important things to set before you start playing with anything. Know how much you will be spending and protect your credit card. This enforces that if I did exceed my spending limit I did something wrong or I do need to scale.
I find this to be cheaper to manage and handle, and I think I can bring this one to scale for my own teams or people I work with. Okay more on this later good luck!
Happy Science!