As I hope it’s become abundantly clear throughout this blog, data science is more of a hands-on endeavor than a theoretical science. Even though it does have a solid theoretical foundation, data science was established as a practical discipline. As such, it relies heavily on programming and manipulating data structures, tasks that stem from the effective use of a programming language. Let me clarify here that by programming language I mean an application that allows its user to create executable scripts that can implement algorithms in an efficient way. Therefore, R is not a programming language in that sense, although it is a delightful data analytics platform (particularly for Stats applications). One programming language that is gaining ground as of late is Julia, which despite its quirky name, it seems to be very robust.
Julia language is very much like Python, so if you are a Pythonista like me, you will find it very familiar and intriguing. Basically, it is what Python would be if it were created in this decade (Julia was created around 2012, since that’s when the first references to it came about). However, Python was created two and a half decades ago (in February 1991 to be precise), and as a result, it does not have the merits of a modern language. Specifically, Python is not as fast, nor as data-oriented (even though the Pandas, Scikitlearn, and NumPy packages really add a lot to it in that respect). These inherent limitations of Python and other high-level programming languages of the previous century are addressed in Julia. Its creators, Dr. J. Bezanson and his associates from MIT, put a lot of effort to make a language that is both easy to use and is fast. Just like all other programming languages I know of, Julia is open-source and even its IDEs are free to license.
“So what?” you may ask. “We are not programmers, right?” Well, we are indeed not programmers and the way things are going we may never be. However, we can’t do much in data science without programming, just like a physicist can’t do much without math! In fact, this is our niche, in comparison to conventional data analysts and BI professionals (although there are data analysts nowadays who know how to write and run scipts). The thing is that even though we use programming heavily, we don’t really care about the inner workings of a programming language or about the esoteric concepts that accompany it. If I can get my model to achieve a good generalization with the data I have, I don’t care if the lines of code I write to accomplish this is optimum, or whether I could have shaved a few seconds by spending a few hours cleaning up the code. At the end of the day, as long as I can meet my deadlines and have a script that doesn’t spit errors or exceptions left and right, I am a happy camper!
Enter Julia. This language is like a God-sent for data science as it is as cool as Python to code in, and saves us a lot of time when running the computationally intensive data wrangling scripts. Surely it does not have the breadth of packages that Python has but hey, it manages to do what I want it to do and for me that’s all that matters. Also, if I need to tweak something or write my own auxiliary functions, applying the fox-like attitude to the craft, that’s not only possible but fairly easy with a language like Julia. Do I need to say more?
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.