Designing a Convolutional Neural Network For Cannabis Machine Learning Part 1

5 min readApr 30, 2023

A nice and bright Sunday morning to design Convolutional Neural Networks. A CNN is a neural network trained on using images and passed through a series of filters to determine features. As an organic chemist, we use our eyes in designing structures so perhaps this an intuitive machine learning approach and fragment a molecule using our own internal filters. It seems natural to try it out.

Our goal for this network is we want to know if a molecule will behave like cannabis based on it’s binding constant (how strong a ligand binds) into a common endocannabinoid receptor, CB1, a principle target for THC. The crystal structure was derived by Hua et al.

Crystal Structure of the Human Cannabinoid Receptor CB1

Cell. Author manuscript; available in PMC 2017 Oct 20. Published in final edited form as: Tian Hua, 1, 2 Kiran Vemuri…

www.ncbi.nlm.nih.gov

The structure is a series of alpha helices in a shoe type shape with the binding pocket being very deep within the receptor.

The of THC to a CB1 receptor can be quantified with an inhibition constant, Ki, which is an indication into how potent an inhibitor is. Small Ki values of 100 micro molar are considered to be potent.

If we take a list of inhibitors for CB1 and their inhibition constants as recorded in nM from this paper:

Targeting Cannabinoid Receptors: Current Status and Prospects of Natural Products

Cannabinoid receptors (CB1 and CB2), as part of the endocannabinoid system, play a critical role in numerous human…

www.ncbi.nlm.nih.gov

Can we predict whether a new cannabinoid compound would behave similar to a terpene like THC based on it’s inhibition constant value. Here’s some of the data abstracted from the paper. Feel free to grab more.

compounds = {
    'CCCCC1=CC(=C2C3C=C(CCC3C(OC2=C1)(C)C)C)O': 36,
    'CCCC1=CC(=C2C3C=C(CCC3C(OC2=C1)(C)C)C)O': 22,
    'CCCCCC1=CC2=C(C3C=C(CCC3C(O2)(C)C)C)C(=C1C(=O)O)O': 620,
    'CCCCCC1=CC(=C(C(=C1)O)C/C=C(\C)/CCC=C(C)C)O': 1045,
    'CCCC1=CC(=C(C(=C1)O)C/C=C(\C)/CCC=C(C)C)O': 2865,
    'CCCCCC1=CC(=C(C(=C1C(=O)O)O)C/C=C(\C)/CCC=C(C)C)O': 13116,
    'COC1=C(C/C=C(CC/C=C(C)\C)\C)C(O)=CC(CCCCC)=C1': 10000,
    'CCCCCC1=CC(=C(C(=C1)O)C2C=C(CCC2C(=C)C)C)O': 1690,
    'CCCC1=CC(=C(C(=C1)O)C2C=C(CCC2C(=C)C)C)O': 14445,
    'CCCCCC1=CC(=C(C(=C1C(=O)O)O)C2C=C(CCC2C(=C)C)C)O': 626,
}

To design our CNN, we will first need to prepare the data into an image based format. To do so, we will use the theory of “Chemception” where a SMILES string is converted into a 2D molecular coordinates and mapped onto a 80 by 80 grid. Where the atoms and bonds are color coded and everything else is black.

Let’s get our imports going:


import rdkit
import numpy as np
import pandas as pd

from rdkit import Chem
from rdkit.Chem import AllChem

from matplotlib import pyplot as plt

And convert our data into a Dataframe:

df = pd.DataFrame()
df['smiles'] = list(compounds.keys())
df['ki'] = list(compounds.values())
df["mol"] = df["smiles"].apply(Chem.MolFromSmiles)

Next we want to create a chemcepterize function. This first is to create the image space of the molecule. You want a molecule that is centered and not too small and is readable to the network. We call these parameters embed and res . By changing the embedding you can alter the size of the image. Obviously, the embedding where it’s higher might have a poor time or a lot of dropout of nodes since a lot of space is empty.

The effect of a resolution on the image can be where the space occupied by different information can dominate.

The embedding and resolution of your image is very important because it allows the neural network to abstract features more efficiently. So we compute the dimensions and the 2D coordinates and set up 4 empty zeroes that will act as a meta store for information like atomic number.

def chemcepterize_mol(mol, embed=10.0, res=0.2):

    dimensions = int(embed*2/res)
    chempcepterized_molecule = Chem.Mol(mol.ToBinary())
    chempcepterized_molecule.ComputeGasteigerCharges()
    AllChem.Compute2DCoords(chempcepterized_molecule)
    coords = chempcepterized_molecule.GetConformer(0).GetPositions()
    vector = np.zeros((dims,dims,4))

The initial vector looks something like this:

[[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  ...
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  ...
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  ...
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

Next, we want to iterate through the molecule’s bonds and embed the bond order information into each pixel at that particular coordinate. To do that, we iterate through each bond and grab the start, end atom index.

for i,bond in enumerate(mol.GetBonds()):

    bond_order = bond.GetBondTypeAsDouble()
    start_atom_index = bond.GetBeginAtomIdx()
    end_atom_index = bond.GetEndAtomIdx()
    start_coordinates = coords[start_atom_index]
    end_coordinates = coords[end_atom_index]

To put the bond order information into the image, we want to translate the numbers into a more manageable dimension that scales with our resolution and built a line between the two. We set up a fractional space between 0 and 1 with the resolution being how many pixels will take up that information. The line will be points a long the space and then supplied the bond order information.

fractional_space = np.linspace(0,1,int(1/res*2))

for fraction in fractional_space:
    coordinates = (fraction*start_coordinates + (1-fraction)*end_coordinates)
    x = int(round((coordinates[0] + embed)/res))
    y = int(round((coordinates[1]+ embed)/res))
    vector[ x, y, 0] = bond_order

The first channel is used to take those coordinates and correlate a bond order number. Let’s do the same thing with the atom and supply the other channels with the atomic_number , hybridization , and gasteiger charge . And finally return the vector.

for i,atom in enumerate(cmol.GetAtoms()):
   x = int(round((coords[i][0] + embed)/res))
   y = int(round((coords[i][1]+ embed)/res))
   
   vector[ x , y, 1] = atom.GetAtomicNum()
   
   hyptype = atom.GetHybridization().real
   vector[ x , y, 2] = hyptype

   charge = atom.GetProp("_GasteigerCharge")
   vector[ x , y, 3] = charge

return vect

Our full function then looks like this:

def chemcepterize_mol(mol, embed=10.0, res=0.2):

    dimensions = int(embed*2/res)
    chempcepterized_molecule = Chem.Mol(mol.ToBinary())
    chempcepterized_molecule.ComputeGasteigerCharges()
    AllChem.Compute2DCoords(chempcepterized_molecule)
    coords = chempcepterized_molecule.GetConformer(0).GetPositions()
    vector = np.zeros((dims,dims,4))
    
    # Bonds

    for i,bond in enumerate(mol.GetBonds()):
    
        bond_order = bond.GetBondTypeAsDouble()
        start_atom_index = bond.GetBeginAtomIdx()
        end_atom_index = bond.GetEndAtomIdx()
        start_coordinates = coords[start_atom_index]
        end_coordinates = coords[end_atom_index]
        
        # Draw Bond A long Vector 

        fractional_space = np.linspace(0,1,int(1/res*2))

        for fraction in fractional_space:
            coordinates = (fraction*start_coordinates + (1-fraction)*end_coordinates)
            x = int(round((coordinates[0] + embed)/res))
            y = int(round((coordinates[1]+ embed)/res))
            vector[ x, y, 0] = bond_order
    
    # Atoms

    for i,atom in enumerate(cmol.GetAtoms()):
       x = int(round((coords[i][0] + embed)/res))
       y = int(round((coords[i][1]+ embed)/res))
       
       vector[ x , y, 1] = atom.GetAtomicNum()
       
       hyptype = atom.GetHybridization().real
       vector[ x , y, 2] = hyptype
    
       charge = atom.GetProp("_GasteigerCharge")
       vector[ x , y, 3] = charge
    
    return vect

Our final function then could look like this for our DataFrame:

def vectorize(mol):
    return chemcepterize_mol(mol, embed=12)

df["mol_image"] = df["mol"].apply(vectorize)

Now that our data is prepared, we will build the CNN. Try preparing the data yourself and getting a feel for chemcepterizing a list of molecules.

Designing a Convolutional Neural Network For Cannabis Machine Learning Part 1

Crystal Structure of the Human Cannabinoid Receptor CB1

Cell. Author manuscript; available in PMC 2017 Oct 20. Published in final edited form as: Tian Hua, 1, 2 Kiran Vemuri…

Targeting Cannabinoid Receptors: Current Status and Prospects of Natural Products

Cannabinoid receptors (CB1 and CB2), as part of the endocannabinoid system, play a critical role in numerous human…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sulstice

No responses yet