Using python to check molecular dynamic simulation log files.

4 min readNov 28, 2020

Ah Log files — gotta love them and hate them. During my rotation project for the PhD I have to often check these logs files for the progress of molecular dynamic simulations (I’m using GROMACS btw). These simulations can take days to run which means I have to do a daily check usually 3 times a day

Now this task is very cumbersome…..because if I have 12 simulations running which means 12 log files being produced md_3.log(standard log file for GROMACS) and this is rotation so imagine after that. So to do my daily check here’s the time breakdown

VPN Connection with 2FA - 2 Min + However long it takes me to find my phone)
SSH Connection - 10 Seconds
Navigate - 10 seconds
Tail - 3 Seconds
Input the number into google and some math - 2 Min

So roughly I spend around 4 minutes and 23 seconds checking log files and even then that was for one. This is inefficient and I feel like I’m going to barking mad.

If you are in my situation and in dire need of something cool for python to check how things are progressing check it out this is what I setup for myself :).

First I initialized an object called LogReader with two parameters target_directory denoting which directory we want to check for the logs and target_log the target log file we are interested in tailing.

class LogReader(object):

    __version__ = '0.0.1'

    def __init__(self, target_directory, target_log):

Next, the steps that were in my mind:

Handle Authentication to the machine
Grep all the log files of interest
Read all the log files and their current setup
Create a table that outputs the data easy for the user to understand.

Let’s get started then:

class LogReader(object):

    __version__ = '0.0.1'

    def __init__(self, target_directory, target_log):
        self.user, self.hostname = self._read_authentication()

We want to store our configurations in a yaml configuration because if I want this spread through my lab mates they each have their own auth/configs they would need to setup.

I created a file called auth_apollo.yml that looks a little something like this:

username: XXXXXXX
host: XXX.XXX.XX.XXX

Let’s create that function to read in the file:

def _read_authentication(self):

    from ruamel.yaml import YAML
    yaml = YAML()

    with open('./auth_apollo.yml') as auth_file:

        try:
            auth_info = yaml.load(auth_file)
        except Exception as exc:
            print('Problems handling in reading in the authentication file')

    user = auth_info['username']
    hostname = auth_info['host']

Simple :), that’s the authentication configuration all set but now we need to connect and grep those files. We need to know the target_directory and the target_log. We also need to include the values stored from different simulations into a dictionary timings and a function to read the loggers.

class LogReader(object):

    __version__ = '0.0.1'

    def __init__(self, target_directory, target_log):
        self.user, self.hostname = self._read_authentication()
        self.target_directory = target_directory
        self.target_log = target_log
        self.timings = {}
        self._read_loggers()

We’ll be using a python package called fabric as a way to issue commands over SSH using python. With fabric it’s pretty simple to setup:

def _read_loggers(self):

    from fabric.connection import Connection

    with Connection(self.hostname, self.user) as portal:

First we setup the Connection port, the with loop verifies that the portal into the remote server is still open.

The first bash command I want to execute is a find on the logs so I can get all the necessary paths”

file_paths_stream = portal.run('find ' + self.target_directory + ' -type f -name "' + self.target_log + '"', hide=True)

I don’t care much for the stdout since it will clutter the screen but I do want to process it.

file_paths = file_paths_stream.stdout.strip().split('\n')

This will now return all my file paths

Now I just want to make a for loop and tail each log so I can retrieve a value called Step which tells me at what step the MD simulation is running.

for file_path in file_paths:

    stdout_stream = portal.run('tail -n 13 ' + file_path + '', hide=True)

And to process to retrieve the value and store it in our class attribute self.timings:

stdout = stdout_stream.stdout.strip().split('\n')[0].split('vol')[0].strip().split(' ')[-1]
    self.timings[file_path] = stdout

Obviously, that took some work but this will process the log files for GROMACS so if you use that software then just copy/paste!

Next, we want to include this in a beautiful table:

def generate_table(self):

    from beautifultable import BeautifulTable

    table = BeautifulTable()

    table.column_headers = ["--- File Path ---", "Step", "Time (ns)"]

    for key, value in self.timings.items():

        row = []
        row.append(key.replace(self.target_log, ''))
        row.append(value)
        row.append(float(value) / 500000)

        table.append_row(row)

    print(table)

This one is pretty explanatory but for the Time (ns) I divide the step number by 500000 and retrieve my values.

Putting this all together:

And if I run this like so:

log_reader = LogReader(path, 'md_3.log')
log_reader.generate_table()

I get the output:

Voila~ I hope this helps another student and I’ll be expanding this into OpenMM real soon.

Using python to check molecular dynamic simulation log files.

Written by Sulstice

Responses (1)