Welcome back to another graduate level lecture in which we start coding some of our first cheminformatic bots to perform different data engineering and machine learning pipelines.
Our goal is to get a discord bot to perform an action based on the user’s input which would be a list of SMILES and perform Principal Component Analysis which can be found in this previous blog
So we need an idea of what the schematic is going to look like tech-wise. And we need to layout a couple of design questions that will govern how we will the bot.
- We need to have a place where the user can interact the bot
- We need a free place to run the code and get the output file which is an HTML file.
- We need to send the output back to the user.
Our implementation consisted of creating a repository called workers.
This is where we are going to perform any bot services. The idea is that we are going to use Github Actions to run python code. The server is free for public repositories and for purposes of our demo we don’t need to worry about high throughputs. One benefit however, is there are options to scale when moving to big data.
The other component for the user interface is we are going to use Discord as the user interface to talking to the bot.
Which can easily be tied with a webhook
This makes it easy for us to create cheminformatic pipelines that are easily accessible to a wide set of people for free to try out under a user interface where we don’t have to worry too much about things like UI.
For any of these pipelines I turn the python class object into a command line tool using click and the full code of the script can be found here
# Click Commands
# --------------
@click.command()
@click.option('--smiles_list', default=[], help='Molecules to Analyze')
@click.option('--morgan_radius', default=1, help='Morgan Radius')
@click.option('--bit_representation', default=512, help='Bit Representation for the Fingerprint')
@click.option('--number_of_clusters', default=5, help='Number of Clusters to Categorize Data')
@click.option('--number_of_components', default=0.95, help='Number of Components')
@click.option('--random_state', default=0, help='')
@click.option('--file_name', default='pca_analysis.html', help='name of the html file')
@click.option('--principal_component_x', default=0, help='Principal Component of X-Axis')
@click.option('--principal_component_y', default=1, help='Principal Component of Y-Axis')
@click.option('--x_axis_label', default='PC1', help='X-Axis Title')
@click.option('--y_axis_label', default='PC2', help='Y-Axis Title')
@click.option('--plot_width', default=1000, help='Plot Width')
@click.option('--plot_height', default=1000, help='Plot Height')
@click.option('--title', default='Principal Component Analysis on SMILES', help='Title of Graph')
# Pipeline
# --------
def controller(
smiles_list,
morgan_radius,
bit_representation,
number_of_clusters,
number_of_components,
random_state,
file_name,
principal_component_x,
principal_component_y,
x_axis_label,
y_axis_label,
plot_width,
plot_height,
title,
):
pca_analysis = PCAAnalysis(
smiles_list,
morgan_radius,
bit_representation,
number_of_clusters,
number_of_components,
random_state,
file_name,
principal_component_x,
principal_component_y,
x_axis_label,
y_axis_label,
plot_width,
plot_height,
title,
)
pca_analysis.conduct_analysis()
if __name__ == '__main__':
controller()
This will make it easier to pass variables in at the Action level. However, we need to think about the max size of the input since it will be a list of SMILES. That’s something to think about in the future.
Next let’s create the github action workflow yml
file where the full code can be found here
We want to have a way to trigger the workflow for testing and I’m think a github issue would be a good first pass.
name: PRINCIPAL COMPONENT ANALYSIS BOT
on:
issues:
types:
- labeled
Next we want to have it happen on a specific tag which would be a pca_analysis
jobs:
pcabotrun:
if: github.event.label.name == 'pca_analysis'
runs-on: "ubuntu-latest"
Here is an example of what continuous testing would look like. Next, what we want is have a way of installing python, our dependencies and caching the environment on the Github repository so we can not have to worry about installing software every time we call the server.
steps:
- name: Checkout source
uses: actions/checkout@v2
- name: Setup python
uses: actions/setup-python@v2
with:
python-version: 3.9
architecture: x64
- uses: actions/cache@v2
with:
path: ${{ env.pythonLocation }}
key: ${{ env.pythonLocation }}-${{ hashFiles('setup.py') }}-${{ hashFiles('dev-requirements.txt') }}
- name: Install
run: |
pip install pandas
pip install click
pip install scikit-learn
pip install numpy
pip install rdkit-pypi
pip install bokeh
pip install selenium
And the final piece would be to run the workflow and send the output file of the script back into discord. In the github Settings add your webhook URL.
And then include it in your script and this is going to be the channel where you want your output or message to go. We need a github action bot that performs this service for us and so I landed on sinshutu’s code
- name: Run Analysis
run: |
cd discord_services
python principal_component_analysis.py
- name: Send file README.md to discord channel
uses: sinshutu/upload-to-discord@master
env:
DISCORD_WEBHOOK: ${{ secrets.MACHINE_LEARNING_WEBHOOK_TOKEN }}
with:
args: discord_services/plot.png
Our final result is then this:
Part 2 will then be coding the user interface with the bot.