Lecture 002 — Exploring The Chemical Universe with Potential Energy Functions.

Sulstice
6 min readJan 28, 2023

--

Motivation

This is more personal for me. I was sad going through my breakup and my surgery recovery so I mentally retreated away. I didn’t know how to explain to anyone my own emotions so I spent a lot of time looking at molecules. What made me happy or gave me energy is that I could connect to people from a lot of different fields with knowing functional groups from organic chemistry, as an extrovert I don’t like working alone and enjoy people. I asked everyone I knew about different functional groups that they have for their purpose and we could have discussions. I think I looked a little crazy, however, it helped me reconnect and figure out my own personality again.

A lot of that spurred my motivation for exploring chemical space because it was therapeutic and I could heal myself down here. A lot of the code that follows these series has some of my inner thoughts and maybe those thoughts will help someone else who might be going through the same thing. I am forever grateful for those people in life that kept me on track, supported my thoughts, and helped me heal emotionally.

A Reference Score

As I began my journey, I needed a reference index or some sort base metric to tell me what I have seen before. I also wanted it to make sense. I needed a beacon to help guide me into different molecular descriptors that I have seen before. The problem with SMILES is that when you have something like benzene vs cyclohexane:

benzene = 'C1=CC=CC=C1'
cyclohexane = 'C1CCCCC1'

The carbon between both is indistinguishable from each other. As any chemist knows however these carbons are different. Well it turns out, there is languages that describe the nature of these carbons and how they are different in respect to the chemical environment. In the example in Figure 1, we can see the benzene carbon C translates to CG2R61 and the cyclohexane carbon C translates to CG3C62 .

Figure 1: Chemical Language Transformation

In this language of atom types we break down different types of carbons into a formal language. This formal language and how atom types are declared come from it’s base molecular descriptor — a potential energy function. A potential energy function is base descriptor describe the energy that molecule potentially has. If you remember simple physics where a spring is modeled with the equation (1/2)kx² which we use in chemistry to describe a bond between two atoms:

Figure 2: Physics Model describing two atoms.

Well, it turns out not all atoms are the same. If we revisit our case of benzene and the cyclohexane where we have different carbons: CG2R62 and CG3C62 however there associated spring and force constants between them where benzene has alkenes meaning the distance between them would be shorter. If we extend our physics model into 3 atoms in Figure 3, where there is a force associated with the angle between three atoms.

Figure 3: Model of a 3 atom system with an angle and force constant.

In computational chemistry, when they model the atoms and their movements they use Force Fields which describe the potential energy of the molecule to determine how it is going to move. In computational chemistry, you might hear about GROMOS, CHARMM, AMBER which correspond to not only the software but the energy function that determins the atoms. The CHARMM potential energy function describes a molecolule into 5 bonded descriptors (Bonds, Angles, Dihedrals, Improper Dihedrals, Urea-Bradley) and 2 non-bonded descriptors (Coloumbic, Lennard-Jones). For my thesis, I primarily focus on Lennard-Jones as show in Figure 4 and can be found here in more detail with each component explained.

Figure 4: CHARMM Potential Energy Function

Now let us say, you wanted to record all the atoms molecular geometry and record all their potential descriptors. You would need language to describe each atom in relation to each other hence atom types in an organized fashion. The trick is some atoms could be closer than others in relation. Let’s take the idea that we have recorded 3 different carbons parameters (benzene, propane, cyclohexane) and we want to figure out which carbon on pyridine is closer too or should it be it’s own field. What if we extend it to acetone carbon.

C-Benzene
C-Propane
C-Cyclohexane

Insert 'C-Pyridine'

C-Benzene
|__C-Pyridine
C-Propane
C-Cyclohexane

Insert 'C-Acetone'

C-Benzene
|__C-Pyridine
C-Propane
C-Cyclohexane
C-Acetone

If we do this for enough functional groups and atoms we have a good reference index of useful compounds and their physical geometric values.

A useful descriptor that can be quantitatively predicted from recording all the geometries that could be of value to both a chemist and physicist is the partial charge of an atom which the algorithm can be found here that takes in the connectivity of the molecule with charge increment values and subtracts from the formal charge. The partial charge can tell you the relative strength of different systems. With new molecules that we have not seen before we could estimate parameters however we need to know whether those parameters for a new molecule are good.

In the CHARMM General Force Field (CGenFF) and the General Amber Force Field (GAFF), there is the idea of a penalty score which tells us how based on the analogy to known parameters. When processing large data for force field development, we want to look for new systems that we haven’t seen before and the charge penalty score served as a good beacon for finding systems of interest based on the trend of number. A penalty score of 0 means we have parameters where as above 100 would mean these parameters are a rough idea. Well it turns out I could use this charge penalty score to find systems molecular patterns were simple in design but rare in terms of usage. I could use the penalty score between values of less than 100 to state it’s something I have seen before and not of particular interest and more than 100 is something rare as seen in Figure 5.

Figure 5: A screening of chemicals and their charge penalty score.

I searched through systems while combining with molecular weight and elements of interest, sulfur, and found a dithiolane fragment that hasn’t surfaced in the latest emerging ring systems however it was used a lot in organic synthesis. The dithiolane structure was simple in design and could be worthwhile as a fragment to exploring for force field development with the conjoined carbon between two sulphor atoms. This is the same one tattooed to my arm.

In the next lecture, I will teach about how to go about screening large datasets and navigating through chemical space using different molecular descriptors to fulfill a purpose.

--

--

Sulstice
Sulstice

No responses yet