A Comprehensive Overview for releasing a Python Package for Scientific Publication.
After 6 months of writing an API for cheminformatics scientists, I was ready to release my package into the world for people to use. What I failed to realize is that distributing my package wasn’t as easy as I thought it would be.
After a slew of stack overflow questions and blog posts I finally honed in on what a package should look like. This article will cover how I released “Cocktail Shaker” to the world ready for publication and users.
Phase 1: Github Repository Setup
A well structure repository is key to maintaining the code within your repository. If you dig through my previous github repos that I built while in my undergraduate education, it’s pure chaos. Ultimately my ending goal was to get the publication not so much the software sustainability and reusability.
After several iterations of moving things around I landed in on this overall structure for my repository.
Breakdown:
- cocktail_shaker: Major source code for the package (make sure this directory is what you want to name the overarching package as well)
- docs: The documentation that will be used as support for the package hosted on readthedocs
- images: Any images that will be used for the documentation or repository in general. If your package uses images for testing then I would highly recommend making a subdirectory in your tests directory.
- tests: This directory I reserve for strictly validation testing of my package (and for this tutorial I use pytest).
- .coverage: Is a file generated through nose for coverage of your tests for your python package.
- .coveralls.yml: is a service I use to provide statistical metrics on my nosetest coverage for my package. It publishes online at coveralls
- .gitignore: Surprisngly, not many scientists include this one in their repository. Github has a great dropdown menu for selecting what language your repository is and creating an appropiate.gitignore file (in our case Python)
- .travis.yml: My python package has an automated build system associated with it that is hosted on Travis (for free!). The .travis file holds the instructions used to set up the container and run the associated tests on each pull request into the master branch (I can cover this in a later post).
- LICENSE: The license to be deployed with the package. Mostly likely we will be distributing on PyPi so you might want to go with open source. Highly suggest looking over the Open Source’s Initiative guide into licenses.
- MANIFEST.in: This file is always controversial. When distributing your package online this file is used to include what else you would like to be distributed with your source code. Now this can be taken care of with setup.py where we specify what we need anyway. To be on the safe side I included it mine just as an assurance. I have also seen a push to use the package setuptools_scm (but never tried it myself). More info on the alternative can be found here in this article
- Makefile: This isn’t at all necessary but makes the life easier of a developer who would want to clone my repository instead of pip installing. The make file has a list of instructions that the user can type in to either install the requirements of cocktail shaker, run the tests etc.
- README.md: I’m sure we are all familiar with a readme file, but it’s a markdown file where you can write instructions, information, funny pictures, memes anything that you would like your user to know! Later in this article we can delve a little deeper into how to structure the README.
- mkdocks.yml: Is allows for a project to host their documentations into HTML static site anywhere. Used in conjunctions with readthedocs.io where you can officially view the documentation of Cocktail Shaker.
- paper.bib: This will serve as your citation for your publication which for me, I am heading to publish in Journal of Open Source Software.
- paper.md: A markdown file of your official publication. Formatting is actually based on the Journal of Open Source Software requirements for a publication.
- pytest.ini: The pytest.ini is a configuration file for pytests to store specific information about the test. I use it to set my guidelines of my python code (85 characters width), number of attempts to run a request (due to the nature of my API connector to an external database), and provide pytest the information that anything starting with a test_ in the python name of the script then that is a test for pytest to execute.
- setup.cfg: is used for settings for any plug-ins or the type of distribution. I use it mostly for metadata for version releases in my package.
- setup.py: Every package has their own flavour of this file. setup.py is used to tell the distribution about what files should be in the package. This includes depedencies, how it functions, entry points, and licenses. If you go ahead and take a look at this file in my repository then please go ahead and copy it! It’s a template I have used continuously and would love to share it with the world.
Whew! That was a lot to cover but all necessary for different types of users. I not only want scientists to use my code for their own research but also allow users to develop as well (it is open source!).
Phase 2: The README FILE
The README file is imperative for your repository, think of it as a marketing tool to attract users but also serve as a general intro into your package.
Badges (Makes your repo super shiny):
- Builds: Since I use travis to automate a lot of my builds I have a badge to indicate anyone using the system that in it’s current state it is stable. (Helps validate to the user that it is good to go).
- License: Let’s the user know what license the repository is filed under and whether they can freely use it.
- Coverage: I user coveralls to detect statistics about my tests. It provides me a percentage on how much my tests have I covered. This isn’t the most useful metric but it helps validate to the user that yeah maybe the developer has written something substantial and valid.
- Python Version: Note that I have not included anything Python2. This is all Python3. Right off the bat the user knows that I do not support Python2 and it is important to make that distinction since there are a lot of differences.
- Gitter: Gitter is a chat service for repositories so you can viably chat about your package and allow users to ask questions.
- Zenodo: DOI (Digital Object Identifier) is important when doing a release and a publication. Thankfully Zenodo allows you to classify a repository with a tag to ensure it’s uniqueness. You can use this DOI number freely in your publication (it is also free to publish).
- Docs: A link to the documentation that I have hosted on readthedocs. It allows the user quick access to the documentation as well as letting the user know that I have stable documentation. Documentation is also key as a guide into your package (more on that in another article).
After reading countless READMEs, I have consolidated and arrived at what I believe is most essential to have on your front page.
README File Structure
- Section 1 Overview: High level overview explaining motivation and features on the project. It can be relatively short but just quick bullet points into an explanation.
- Section 2 Announcements: I particularly enjoy this one, but I want to show to the user that the package is being used and where I am showcasing the project. It helps give validation.
- Section 3 Users: Who do you intend to use this package?
- Section 4 Installation: How do you install the package, try to be diverse as possible i.e (Windows, MacOS, Linux). Each installation can vary and be tricky but if you want people to use your package it will be really helpful to provide that information. (Don’t be afraid!)
- Section 5 Development Installation: A little different from your regular installation, since we want to give the developer freedom to access and control.
- Section 6 Structure of the Repository: Similar to what I wrote before in a rough overview of the github repository. Allows transparency into your package so the user knows exactly what they are installing.
- Section 7 Genesis: Who worked on the package and what were their roles?
- Section 8 External Links: Generally, I just add the documentation link and any other relevant information (perhaps the publication as well).
Feel free to copy and paste the README.md file in Cocktail Shaker to use for your own package!
Phase 3: Distribution
This took me a long time to figure out but I finally reached a reasonable way of distributing my package. The first was including what functions I would like my users to use (private vs public). I do know you can add private functionality using the __my_function double underscore pre-fix but that seems unnecessary.
Within the cocktail_shaker directory there is a __init__.py file. If you are unfamiliar with this file I highly suggest reading this article. The __init__.py file we can specify exactly what classes the user will be able to use.
from cocktail_shaker.file_handler import FileWriter, FileParser
from cocktail_shaker.functional_group_enumerator import Cocktail
name = 'CocktailShaker'
I’ve only allowed the user to access three objects within my package while the rest remains private. Why is this important? A lot of my classes depend on some functionality run in the ones accessible. Independently, they are not too useful (they also don’t have well crafted documentation). This makes it easier for the user if you limit the options available. We want our package to be easy-to-use (always remember that).
Next we need to actually distribute our python package. PyPi is a great service that allows for your package to be distributed via a package manager called pip. First you would need to create an account on their website. That was easy! Now we need to distribute our package to the PyPi.
Next let us install setuptools, setuptools is a facilitator package used to aid in the efforts of distributing your package.
python -m pip install -U setuptools
Next the setup.py. setup.py must be configured correctly in order for this to work. A bad setup.py will mean your package will get rejected when uploading. Previously, I mentioned a template for setup.py here’s what cocktail_shaker’s looks like
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# package setup
#
# ------------------------------------------------ # imports
# -------
import os# config
# ------
try:
from setuptools import setup, find_packages
except ImportError:
from distutils.core import setup, find_packages# requirements
# ------------
with open('requirements.txt') as f:
REQUIREMENTS = f.read().strip().split('\n')TEST_REQUIREMENTS = [
'pytest',
'pytest-runner'
] if os.path.exists('README.md'):
long_description = open('README.md').read()
else:
long_description = 'Cocktail Shaker is drug enumeration and expansion library' # exec
# ----
setup(
name="cocktail_shaker",
version="1.0.1",
packages = ['cocktail_shaker'],
license='MIT',
author="Suliman Sharif",
author_email="sharifsuliman1@gmail.com",
url="https://www.github.com/Sulstice/Cocktail-Shaker",
install_requires=REQUIREMENTS,
long_description=long_description,
long_description_content_type='text/markdown',
zip_safe=False,
keywords='cocktail chemistry ligand-design shaker',
classifiers=[
'Development Status :: 4 - Beta',
'Natural Language :: English',
'License :: OSI Approved :: MIT License',
'Intended Audience :: Developers',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
],
test_suite='tests',
tests_require=TEST_REQUIREMENTS)
Some Common Problems: What killed me for a long time was the long_description which is required by PyPi. What you need to do is specify first the type of content (markdown) and also parse it in. If you are running through this tutorial again and already uploaded a package with a version number then you must change that version number otherwise PyPi will reject it. That took me a little while to figure out.
With our setup all done we can now build a source distribution:
python setup.py sdist
This line of code will build the “distribution” of your package which is essentially a copy of all your code ready for deployment. It will create a directory called dist that will hold a tar.gz file to be uploaded. Simple so far!
Next to upload our distribution into PyPi we will be using twine. Twine is a utility package for distributing the package on to PyPi. To install twine you can go through pip.
python -m pip install twine
Next we can now upload the dist directory on to PyPi using a simple command.
python -m twine upload dist/*
Depending on whether this is your first time, you might need to supply a user name and password in relation to the account that you created on PyPi. Follow the prompts and…..
Voila!
Cocktail shaker is available online and ready for distribution and publication! I hope this helps the open science and open source community into building sustainable software for the future!