Mixing CGenFF Atom Types to RDKit Mol2SVG Function and Visualizing

Sulstice
4 min readMar 2, 2022

--

Most often I work in Charmm General ForceField and playing around in it. Each forcefield usually has an accodomating atom-type engine associated with it to log the atoms with respect to their chemical environment.

Decision Tree from Part 1 CGenFF (free to read)

And learning the language of the atom types has taken be about 1–2 years to really solidify some understanding of how to use it.

Recently, I needed to plot a lot of functional groups ~2500 and I can’t add it manually to a powerpoint. It will take too much time.

If you search the RDKit mol2svg function you’ll find a lot of scattered documentation on how to kind of properly form your labels and it took me a couple of hours to kind of piece together what I want to do.

Let’s say I have a compound like anisole:

Well, I would like to label with the cgenff atom types so when showing off the structure I can show to people how the atom types are catalogued. First take a mol2 of anisole which I used open babel to convert my SMILES:

COC1=CC=CC=C1

Mol2 Output:

@<TRIPOS>MOLECULE
*****
16 16 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C 2.7887 -1.1631 0.6477 C.3 1 LIG1 0.0788
2 O 2.7185 0.1587 0.1287 O.3 1 LIG1 -0.4951
3 C 3.1766 0.3465 -1.1464 C.ar 1 LIG1 0.1201
4 C 3.0954 1.6631 -1.6081 C.ar 1 LIG1 -0.0200
5 C 3.5290 1.9955 -2.8927 C.ar 1 LIG1 -0.0583
6 C 4.0483 1.0082 -3.7263 C.ar 1 LIG1 -0.0615
7 C 4.1322 -0.3074 -3.2735 C.ar 1 LIG1 -0.0583
8 C 3.6975 -0.6393 -1.9861 C.ar 1 LIG1 -0.0200
9 H 2.3921 -1.1434 1.6673 H 1 LIG1 0.0660
10 H 3.8253 -1.5122 0.7007 H 1 LIG1 0.0660
11 H 2.1687 -1.8505 0.0628 H 1 LIG1 0.0660
12 H 2.6905 2.4361 -0.9597 H 1 LIG1 0.0654
13 H 3.4606 3.0230 -3.2393 H 1 LIG1 0.0619
14 H 4.3870 1.2631 -4.7270 H 1 LIG1 0.0618
15 H 4.5372 -1.0788 -3.9236 H 1 LIG1 0.0619
16 H 3.7794 -1.6741 -1.6719 H 1 LIG1 0.0654
@<TRIPOS>BOND
1 1 2 1
2 2 3 1
3 3 4 ar
4 4 5 ar
5 5 6 ar
6 6 7 ar
7 7 8 ar
8 3 8 ar
9 1 9 1
10 1 10 1
11 1 11 1
12 4 12 1
13 5 13 1
14 6 14 1
15 7 15 1
16 8 16 1

Pass it through the CGenFF webserver. And you should have your stream file. Which looks like this:

Pay attention to all the outputs of these file formats. They each maintain the rank ordering of the atoms, meaning which atom comes first and then how do you put it into a list. When the SMILES is read into RDKit, it too also maintains the rank ordering when passing in.

Chem.MolFromSmiles('COC1=CC=CC=C1')

So using this rank ordering preserve we can have a little fun with drawing the molecule in RDKit so we can visually see the atom types.

Imports:

from rdkit import Chem
from rdkit.Chem import AllChem,Draw
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG
from rdkit.Chem.PropertyMol import PropertyMol

Read the lines and prepare all out inputs:

anisole_cgenff = open('/home/sulstice/chapters/anisole.str', 'r').read()
lines = anisole_cgenff.split('\n')
atom_rows = list(filter(lambda x: 'ATOM' in x, lines))
hetereo_atom_types = [i for i in atom_rows if i.split()[2][0] != 'H']
name = 'anisole'
smiles = 'COC1=CC=CC=C1'
m = Chem.MolFromSmiles(m)

Now we are going to borrow the mol2svg function that floats around:

def moltosvg(mol, atom_types, name, molSize = (400,400), kekulize = True):

mc = Chem.Mol(mol.ToBinary())

if kekulize:
try:
Chem.Kekulize(mc)
except:
mc = Chem.Mol(mol.ToBinary())
if not mc.GetNumConformers():
rdDepictor.Compute2DCoords(mc)

for i in range(0, len(atom_types)):

atom_type = atom_types[i]
mc.GetAtomWithIdx(i).SetProp("atomNote", atom_type)

drawer = rdMolDraw2D.MolDraw2DSVG( molSize[0], molSize[1] )
drawer.drawOptions().annotationFontScale = 0.5
drawer.drawOptions().centreMoleculesBeforeDrawing = False
drawer.drawOptions().useBWAtomPalette()

drawer.DrawMoleculeWithHighlights(mc, name , {}, {}, {}, {})
drawer.FinishDrawing()
svg = drawer.GetDrawingText()
file = open('%s.svg' % name, 'w')
file.write(svg)
file.close()
return svg.replace('svg:','')

The function loops through the atom types and since rdkit is preserving the rank ordering. Each iteration through the atom type list is effectively a loop through the atom indexes. If we set each index as an atom note we can get the label of the atom type with the atom it’s associated with.

I use the BWAtomPallete for black and white, annotationFontScale to set the size of the atom type, draw molecule with highlights to get the name at the bottom and I write the svg out.

Run it.

SVG(moltosvg(m, hetereo_atom_types, name))

And we should get the output as:

Convert it to png:

import cairosvg
cairosvg.svg2png(url= 'anisole.svg', write_to='anisole.png')

I did this for a poster presentation:

Have fun.

--

--

Sulstice
Sulstice

No responses yet