This last weekend I attended the first PyData conference hosted in Austin and I can just summarize the experience into one sentence, Wow. I’ve spent 6 years coding in python from when I was an undergraduate in my first introduction to programming class to now developing python packages for research. Every year, the language itself just astounds me. So in short, this is my experience at the PyData conference.
Note: This doesn’t capture all of PyData just the workshops/talks I attended.
Extending Python Into the Future — Travis E. Oliphant
Travis was the author of NumPy and has established a new foundation called Quansight Initiate.
It’s goal is to bring open source communities together and encourage more development on it’s existing stack (Jupyter Notebook, NumPy, SciPy etc.). The talk focused heavily on one of their new platforms called “OpenTeams”. A platform developed for open source projects to receive accreditation for their work and a chance of funding opportunities for development and maintenance.
Naturally, I signed up and placed my open source project on their as well. I expect as the user base will grow, we’ll be able to see a much more wider benefit. For now, my project will be a little lonely…
What the? Data Science Questions Asked and Answered — Katrina Riehl
I really enjoyed this talk. It is often confused that software engineers are data scientists as well. My statistics grades in undergraduate were far less than satisfactory but I believe my coding aptitude is reasonable. Katrina comes from cloudfare and runs the data scientist team.
She opened up about the data science methodology and what questions do we really ask in the process of using data and making conducive results.
One thing that spoke out to me most was that over half of the data science projects fail. Although I knew the field was young, it just spoke out that a lot of us are trying to figure out what to do with this much data and what we can we really infer?
Objectionable Content — James Powell
This was a mini-workshop/insight into just constructed a well sophisticated object-oriented model for production. What I loved about this talk was that it got into the nitty-gritty of python and highlighted some of the esoteric features/problems I never knew existed.
James highlighted when building an object-oriented model in python, have it be rigid but flexible. All developers I feel inherently know this, but in practice we tend to forget. I recently built a class object in javascript for work. Originally, it was flexible but as the scope and the specifics for the customer increased it became rigid for one purpose. Now, due to that, I have to spend my time abstracting that layer out and dealing with my own technical debt. So it’s pretty easy to fall into that trap.
I also gained the insight into __slots__ in python. A way to fetch class attributes in python instead of depending on the __dict__ object. After an hour of reading blog posts and seeing how it works in my own code I reached the conclusion….does it actually matter?
uarray: Separating interface from implementation — Travis E. Olliphant
I am definitely a beginner after this talk. Travis started off with his motivation into the creation of NumPy -> originally tasked to help a lot of scientific code standardize on a data structure. NumPy was fast and helped bring the open source community into one code structure. Unfortunately, after the boom of development following NumPy the community is divided again with a lot of projects all using variations of NumPy.
Travis introduced his concept of the “uarray” a universal array interface to work with any variation of NumPy’s API where the end-users can pick and choose which ones they like.
I liked this idea, it gives users access to create and manipulate their api but still have the glue holding all of this together with the foundation being the uarray. A lot of the implementation went a little over my head in terms of technical aptitude so definitely worth watching the video!
An Introduction to Sentiment Analysis of Textual Data — Fatma Tarlaci, Dhavide Aruliah
A workshop hosted by Fatma and Dhavide into how to analyze reviews for whether they are positive, negative, or neutral. The workshop focused on analyzing text data using the nltk package and scikit-learn to detect emotion.
Using reviews of a restaurant in a tsv format we “cleaned” the data and removed unnecessary words and fed them into the scikit random forest classifier model. And like any workshop that is how far we got in two hours. Fortunately, their exercise is available and free to download here! If you like to play around in constructing your own NLP processor.
Although, I didn’t attend every talk I spent a lot of time discussing with other member of the Open Source community and learned so much more from others.
I hope to give me own talk next year at PyData to showcase cheminformatics and how we utilize python to further our research. Until then, hang tight and see you next year!