Lecture 004: Building Your First Cheminformatic Bot with Discord and Github Actions.

5 min readApr 1, 2023

Welcome back to another graduate level lecture in which we start coding some of our first cheminformatic bots to perform different data engineering and machine learning pipelines.

Our goal is to get a discord bot to perform an action based on the user’s input which would be a list of SMILES and perform Principal Component Analysis which can be found in this previous blog

Using Principal Component Analysis to distinct Aromatic and Non-Aromatic Compounds and Identify…

Now that I have a wide enough data set to look at all common groups relevant to a subsection of a community. We can…

sharifsuliman.medium.com

So we need an idea of what the schematic is going to look like tech-wise. And we need to layout a couple of design questions that will govern how we will the bot.

We need to have a place where the user can interact the bot
We need a free place to run the code and get the output file which is an HTML file.
We need to send the output back to the user.

Our implementation consisted of creating a repository called workers.

This is where we are going to perform any bot services. The idea is that we are going to use Github Actions to run python code. The server is free for public repositories and for purposes of our demo we don’t need to worry about high throughputs. One benefit however, is there are options to scale when moving to big data.

The other component for the user interface is we are going to use Discord as the user interface to talking to the bot.

Which can easily be tied with a webhook

This makes it easy for us to create cheminformatic pipelines that are easily accessible to a wide set of people for free to try out under a user interface where we don’t have to worry too much about things like UI.

For any of these pipelines I turn the python class object into a command line tool using click and the full code of the script can be found here


# Click Commands
# --------------

@click.command()
@click.option('--smiles_list', default=[], help='Molecules to Analyze')
@click.option('--morgan_radius', default=1, help='Morgan Radius')
@click.option('--bit_representation', default=512, help='Bit Representation for the Fingerprint')
@click.option('--number_of_clusters', default=5, help='Number of Clusters to Categorize Data')
@click.option('--number_of_components', default=0.95, help='Number of Components')
@click.option('--random_state', default=0, help='')
@click.option('--file_name', default='pca_analysis.html', help='name of the html file')
@click.option('--principal_component_x', default=0, help='Principal Component of X-Axis')
@click.option('--principal_component_y', default=1, help='Principal Component of Y-Axis')
@click.option('--x_axis_label', default='PC1', help='X-Axis Title')
@click.option('--y_axis_label', default='PC2', help='Y-Axis Title')
@click.option('--plot_width', default=1000, help='Plot Width')
@click.option('--plot_height', default=1000, help='Plot Height')
@click.option('--title', default='Principal Component Analysis on SMILES', help='Title of Graph')

# Pipeline
# --------

def controller(
  smiles_list,
  morgan_radius,
  bit_representation,
  number_of_clusters,
  number_of_components,
  random_state,
  file_name,
  principal_component_x,
  principal_component_y,
  x_axis_label,
  y_axis_label,
  plot_width,
  plot_height,
  title,
):

  pca_analysis = PCAAnalysis(
    smiles_list,
    morgan_radius,
    bit_representation,
    number_of_clusters,
    number_of_components,
    random_state,
    file_name,
    principal_component_x,
    principal_component_y,
    x_axis_label,
    y_axis_label,
    plot_width,
    plot_height,
    title,
  )

  pca_analysis.conduct_analysis()

if __name__ == '__main__':

  controller()

This will make it easier to pass variables in at the Action level. However, we need to think about the max size of the input since it will be a list of SMILES. That’s something to think about in the future.

Next let’s create the github action workflow yml file where the full code can be found here

We want to have a way to trigger the workflow for testing and I’m think a github issue would be a good first pass.

name: PRINCIPAL COMPONENT ANALYSIS BOT

on:
  issues:
    types:
      - labeled

Next we want to have it happen on a specific tag which would be a pca_analysis

    
jobs:
  pcabotrun:
    if: github.event.label.name == 'pca_analysis'
    runs-on: "ubuntu-latest"

Here is an example of what continuous testing would look like. Next, what we want is have a way of installing python, our dependencies and caching the environment on the Github repository so we can not have to worry about installing software every time we call the server.

    steps:
      - name: Checkout source
        uses: actions/checkout@v2

      - name: Setup python
        uses: actions/setup-python@v2
        with:
          python-version: 3.9
          architecture: x64
      - uses: actions/cache@v2
        with:
          path: ${{ env.pythonLocation }}
          key: ${{ env.pythonLocation }}-${{ hashFiles('setup.py') }}-${{ hashFiles('dev-requirements.txt') }}
      - name: Install
        run: |
          pip install pandas
          pip install click
          pip install scikit-learn
          pip install numpy
          pip install rdkit-pypi
          pip install bokeh
          pip install selenium

And the final piece would be to run the workflow and send the output file of the script back into discord. In the github Settings add your webhook URL.

And then include it in your script and this is going to be the channel where you want your output or message to go. We need a github action bot that performs this service for us and so I landed on sinshutu’s code

      - name: Run Analysis
        run: |
          cd discord_services
          python principal_component_analysis.py 
      - name: Send file README.md to discord channel
        uses: sinshutu/upload-to-discord@master
        env:
          DISCORD_WEBHOOK: ${{ secrets.MACHINE_LEARNING_WEBHOOK_TOKEN }}
        with:
          args: discord_services/plot.png

Our final result is then this:

Part 2 will then be coding the user interface with the bot.

Lecture 004: Building Your First Cheminformatic Bot with Discord and Github Actions.

Using Principal Component Analysis to distinct Aromatic and Non-Aromatic Compounds and Identify…

Now that I have a wide enough data set to look at all common groups relevant to a subsection of a community. We can…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sulstice

No responses yet