Overriding Gasteiger Partial Charges in RDKit

Sulstice
3 min readJan 17, 2023

--

Let’s take a deeper look into partial charge values which are an important molecular descriptor of any molecule and how it could possibly affect things. Install the pip distribution of rdkit-pypi

!pip install rdkit-pypi

Then let’s get our imports in order:

import numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
from rdkit.Chem.AtomPairs import Pairs

We are going to be investigating Perfluorobutanoic acid which is a chemical group that has these long alkyl fluoryl chains.

smiles = 'FC(F)(C(F)(C(O)=O)F)C(F)(F)F'

So let’s have a look at whether the charge of a molecule can affect a fingerprint? Here we will be using the default similarity technique called Tanimoto Similarity which is used to give a score on how one set of atoms and bonds is similar to another set. We use it to detect differences and find patterns. There’s lots of different methods but let’s stick to common for now. We compute the Gasteiger Chargers with the AllChem function.


molecule = Chem.MolFromSmiles(smiles)
AllChem.ComputeGasteigerCharges(molecule)

for atom in mol.GetAtoms():
prop = atom.GetProp('_GasteigerCharge')
print ( '%s | %s' % (atom.GetSymbol(), prop))

If we have a look at those values with the rank ordering of the atoms:

F | -0.18847445561314791
C | 0.41381329177535614
F | -0.18847445561314791
C | 0.41320669293153911
F | -0.18611676986581191
C | 0.38122831825651354
O | -0.47662566423596492
O | -0.24490878935347329
F | -0.18611676986581191
C | 0.46009468700965073
F | -0.16471144158182188
F | -0.16471144158182188
F | -0.16471144158182188

So let’s convert this into a fingerprint with our morgan radius set to iteration of 1 which you can find a good explanation of in my previous blog. The useFeatures flag is for when you want to include SMARTS matching descriptors for donor, acceptor, aromatic, etc which can be found more in their documentation here. We use a Morgan Radius of 1.

fingerprint_one = AllChem.GetMorganFingerprintAsBitVect(
molecule,
1,
nBits=512,
useFeatures=True,
)

Now let’s set all the charges to 0 and regenerate a fingerprint.


for atom in molecule.GetAtoms():
atom.SetProp('_GasteigerCharge', str(0))

for atom in molecule.GetAtoms():
prop = atom.GetProp('_GasteigerCharge')
print ( '%s | %s' % (atom.GetSymbol(), prop))
break

Which then gives us:

F | 0

So let’s take a deeper look then into how that affects the similarity between the two:

fingerprint_two = AllChem.GetMorganFingerprintAsBitVect(
molecule,
1,
nBits=512,
useFeatures=True,
)

print(DataStructs.DiceSimilarity(fingerprint_one, fingerprint_two))

A score of 1.0 means it is exact and a score of 0 means completely different.

1.0

So the charge doesn’t affect the fingerprint. That one I knew. However, recent cases of a charged fingerprint has been coming out. That’s what I wanted to achieve as well. However, generating fingerprinting algorithms is a little tricky so I went back to the drawing board and revisited Chemception. Using the a Convolutional Neural Network (CNN) with molecular descriptors as a layer in a neural network. This was the key. I could override the Gasteiger Charges with Charmm General Force Field (CGenFF) partial charges to generate the descriptors I need. To get a good look at the partial charge influence we can look into RDKit Similarity Maps and then let’s pause for there. Let this digest.

Gasteiger


smiles = 'FC(F)(C(F)(C(O)=O)F)C(F)(F)F'

molecule = Chem.MolFromSmiles(smiles)
AllChem.ComputeGasteigerCharges(molecule)

for atom in molecule.GetAtoms():
prop = atom.GetProp('_GasteigerCharge')
print ( '%s | %s' % (atom.GetSymbol(), prop))

from rdkit.Chem.Draw import SimilarityMaps
contribs = [molecule.GetAtomWithIdx(i).GetDoubleProp('_GasteigerCharge') for i in range(molecule.GetNumAtoms())]
fig = SimilarityMaps.GetSimilarityMapFromWeights(molecule, contribs, colorMap='jet', contourLines=100)

CGenFF Override

smiles = 'FC(F)(C(F)(C(O)=O)F)C(F)(F)F'

charges =[
-0.190 ,0.4390 ,-0.190 ,0.424 , -0.216 ,0.758 ,-0.597 ,-0.557 ,-0.216 ,0.335 ,-0.140 ,-0.140 ,-0.140 ,0.430
]

molecule = Chem.MolFromSmiles(smiles)
i = 0

for atom in molecule.GetAtoms():
prop = atom.SetProp('_GasteigerCharge', str(charges[i]))
i += 1

for atom in molecule.GetAtoms():
prop = atom.GetProp('_GasteigerCharge')
print ( '%s | %s' % (atom.GetSymbol(), prop))

from rdkit.Chem.Draw import SimilarityMaps
contribs = [molecule.GetAtomWithIdx(i).GetDoubleProp('_GasteigerCharge') for i in range(molecule.GetNumAtoms())]
fig = SimilarityMaps.GetSimilarityMapFromWeights(molecule, contribs, colorMap='jet', contourLines=100)

With the output being, see the differences?

--

--

Sulstice
Sulstice

No responses yet