Partial Charge Fingerprints

Sulstice
7 min readJul 26, 2023

--

Hello, sorry I’ve been gone long. Lots going on in life, just trying to make sense. For this make sure you have rdkit installed:

python -m pip install rdkit-pypi

I was always interested in the partial charge. When you look at a carbonyl compound like an organic chemist we have the electron density shift from something that is more electronegative, the oxygen which likes electrons and denote as a delta “-”. Vice versa it is connected to a carbon that has it’s electrons pulled away so it a delta “+’.

The value of how much the electron shifts is of interest because it can predict reactivity. Can this value be quantifiable? In this paper from Vanommeslaeghe et al., they talk about how the partial charge value can be obtained from a bond-charge increment scheme. It talks about how a value can be acquired through determining the atom’s type and chemical connectivity and then subtracted from the atom’s formal charge. Which turned me on to the idea of charged fingerprints — a way to include information predicted from force fields. Something that others have thought about before

So let’s have a look at benzene and the Gasteiger Charges, an algorithm implemented in RDKit:

smiles = 'C1=CC=CC=C1'

molecule = Chem.MolFromSmiles(smiles)
AllChem.ComputeGasteigerCharges(molecule)

for atom in molecule.GetAtoms():
prop = atom.GetProp('_GasteigerCharge')
print ( '%s | %s' % (atom.GetSymbol(), prop))

Which outputs:

C | -0.062268570782092456
C | -0.062268570782092456
C | -0.062268570782092456
C | -0.062268570782092456
C | -0.062268570782092456
C | -0.062268570782092456

Alright, so how does compare to the values as predicted by a force field, you can submit a molecule here or using LEAP from Amber:

RESI benzene        0.000 ! param penalty=   0.000 ; charge penalty=   0.000
GROUP ! CHARGE CH_PENALTY
ATOM H1 HGR61 0.115 ! 0.000
ATOM C1 CG2R61 -0.115 ! 0.000
ATOM C2 CG2R61 -0.115 ! 0.000
ATOM H2 HGR61 0.115 ! 0.000
ATOM C3 CG2R61 -0.115 ! 0.000
ATOM H3 HGR61 0.115 ! 0.000
ATOM C4 CG2R61 -0.115 ! 0.000
ATOM H4 HGR61 0.115 ! 0.000
ATOM C5 CG2R61 -0.115 ! 0.000
ATOM H5 HGR61 0.115 ! 0.000
ATOM C6 CG2R61 -0.115 ! 0.000
ATOM H6 HGR61 0.115 ! 0.000

So the partial charge values are pretty different which could have some pretty odd effects if they were used as a molecular descriptor in machine learning workflows.

Recently, Seal released this blog post on creating fingerprints from partial charge values only:

So let’s give his code a shot and see what happens? We develop the function that will take in the SMILES and the charges.

def generate_charge_fingerprint(smiles, charges, n_bits=512, bin_min=-1.0, bin_max=1.0, nbins=32):

# Initialize the fingerprint vector
fingerprint = np.zeros(n_bits, dtype=np.uint8)

# Create bins for the partial charges
bins = np.linspace(bin_min, bin_max, nbins + 1)

# Set the corresponding bits for each atom's partial charge
for idx, charge in enumerate(charges):
bin_index = np.digitize(charge, bins) - 1
bit_index = idx * nbins + bin_index
if bit_index < n_bits:
fingerprint[bit_index] = 1

return fingerprint

So let’s have a two look at the binary strings produced from a force field and the Gasteiger:

Gasteiger:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Force Field:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

It looks like from looking at the two strings only the 1 has shifted over once to another bin. Nothing too crazy. However, it shows that we can create fingerprints as unique from partial charges.

However, what if we apply principal component analysis on a list of molecules from charged fingerprints. Do we achieve better cluster differentiation:

--

--

Sulstice
Sulstice

No responses yet