The trees are growing bare
As Fall comes to remind us all
Of the transience of life.
(Synesius, Fall of 2016)
Haikus are a very singular poetry form, developed in Japan and used even today all over the world. They are in essence very short poems following the syllable structure 5-7-5 or 6-8-6 and aim to depict something transcendental usually through a reference to nature. “What does all this have to do with data science?”, you may wonder. Well, haikus are all about trying to condense a very important message in a very brief verse that complies to a very strict form, something that resembles in a way the communication of insights stemming from a data analytics project. Of course, as data scientists we don’t need to be fluent in the art of haiku, since the insights we deliver are often in non-verbal form (e.g. plots and dashboards). However, when writing data science scripts, be it in Python, Julia, or some other programming platform, we need to be concise and efficient.
Data science scripts differ from conventional scripts since the emphasis is more on getting something done, rather than optimizing for readability or some other objective that is often favored by coders. In data science we care about efficiency and try to make our scripts are fast as possible, but we don’t care so much about the form they would have. Unfortunately, we often go to the extreme and write code that is hard to maintain because it’s very dense. However, counter-intuitive as this may seem, this is a step in the right direction. A data science script has to be dense, packing a lot of computational power in a few lines of code, just like a haiku.
Yet, there is no law against enriching that code with comments, just like haikus often have commentaries that aim to explain the ideas the poet wants to convey. Comments are free since they don’t cost us any CPU cycles so there is no limitation as to how many characters we devote to them. However, this humble programming element can add a lot of value to a script since it informs its user (who oftentimes is someone other than the script’s creator) of the functionality of each code block. The funny thing is that even if you feel confident that you know the script inside-out because you created it yourself, you may not have the same clarity or comprehension in the future when you revisit the script for another project. Even if the script is perfect (something rare), you may still need to change it to adjust it to another data set, so you really need to know what’s going on in every part of it. Comments can help shed light in all this.
The Japanese managed to make a part of their culture (minimalistic efficiency) an integral part of world literature through the haiku poetry type. Perhaps we can do the same with our script, benefiting data analytics endeavors beyond our projects. The best part is that we don’t need to comply to a very strict syllable structure, plus we get to express our thinking not only with programming structures but also with comments, making everyone’s life easier.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.