Title pretty much says it, so let’s just dive in. I want to know what women our taking as oral contraceptives, lubricants, and how they possibly relate to the popular recreational therapeutic, Cannabis.
Demo
https://colab.research.google.com/drive/1oA1WcJSvnLUkB-wafbgjbvcIDXvMCMSK?usp=sharing
First we need to collect all molecules used in oral contraceptives and add to our network in Global-Chem, this list stems from the paper: “Molecules of Oral Contraceptives”
Just a high look analysis at this composition of the functional groups within this set, the most striking thing to me is the terminal alkyne especially in conjunction next to the epoxide. Usually alkynes are very reactive species and can be detrimental to the liver (if I remember correctly) and acetonitriles are more tolerant in our body. The double bicyclopropane is an interesting structure in general looking like a hydrocarbon warhead that probably sits really deep in the pocket with a lot of torsion. Would be an interesting functional group to classify future bigger ring systems.
So let’s have a deeper look into how we could teach a machine to classify these molecules or make cluster differentiations. Here are the parameters:
mol_ids = cheminformatics.node_pca_analysis(
sucesses,
morgan_radius = 2,
bit_representation = 512,
number_of_clusters = 2,
number_of_components = 0.95,
random_state = 0,
principal_component_x = 0 ,
principal_component_y = 1 ,
x_axis_label = 'PC1',
y_axis_label = 'PC2',
plot_width = 1000,
plot_height = 1000,
title = 'Cannabis and Sex',
save_file=False,
return_mol_ids=True,
save_principal_components=True,
)
And here is the plot that you can find in the demo:
It looks like the machine has mixed the terpenes with the oral contraceptives in the center of the graph:
To differentiate between the two nodes we would need to alter parameters until something makes sense to us. Perhaps if we told the computer there are more differentiating clusters and expanded the fingerprinting.
Expand to 5 clusters:
Still don’t achieve the differentiating factor I would like between the terpenes and the hormones. Not all terpenes are aromatic and usually have the structure that gives it the smell:
Well the molecules are quite big for the oral contraceptives so I would expand the fingerprint set to encompass more differentiable features and hopefully around the alkyne group which might be a significant factor in separation, we also would need more cluster differentiation due cannabis alone have a lot it’s own internal sets of classes. I expand it to 10.
bit_representation = 1024,
number_of_clusters = 10,
Produces this:
The pink is highlighting the differentiating group of the hormones -
And now with the orange we can see the separation:
If you were creating a machine learning model centered around cannabis or sex molecules these hyper parameters would be a good place to start:
mol_ids = cheminformatics.node_pca_analysis(
sucesses,
morgan_radius = 2,
bit_representation = 1024,
number_of_clusters = 10,
number_of_components = 0.95
)
Happy Cheminformatics, and hopefully this elucidates more into functional groups used in different ways.