Open-source software is any piece of software that's open to review and edits/forks. In most cases, it's also free and under the GNU license or something equivalent, though when people refer to it as free, they often use the term as a proxy to freedom. As a result, most people refer to open-source software today as FOSS, which stands for Free and Open-Source Software. FOSS is also a movement of sorts that's taken hold since the earlier days of computing with people like Richard Stallman, who spearheaded the GNU initiative and has been very active in promoting FOSS throughout his life. With the advent of FOSS programming languages and FOSS operating systems (such as GNU/Linux and FreeBSD), this movement grew and is now quite established across various fields that involve programming.
As you can imagine, FOSS is also quite relevant in data science and A.I., at least lately. Most data scientists and A.I. professionals today tend to use an open-source language (many of them using Python, while the more adventurous dabble with Julia, Scala, and lately even Rust), handle open-source dataset (such as those made freely available at the UCI Machine Learning repository, among many other sites), and work with open-source frameworks (such as Scikit-learn, MXNet, and Flow). It's doubtful that many people get into data science with any monetary investment in the tools or the datasets they need since it's a far better investment to spend money on educational resources such as books and videos marketed by a technical publisher. Interestingly, these resources have more in common with FOSS than all that mediocre stuff you find on YouTube these days, labeled as educational for some reason.
FOSS in data science (and A.I. to a great extent) is largely responsible for the immense growth of this field. While back in the old days when I was doing my Ph.D. the best way to get into analytics, particularly machine learning, was through platforms like Matlab that come with a relatively high price tag, nowadays you can start your data science journey without spending any money on the software you use. This way, you can develop some skills and try out the field before deciding to stick with it. Since there are more reasons to commit to data science than not to, the easy point of entry made data science popular, while the trend is also bound to continue.
Nevertheless, it's important to note some exceptions to the FOSS paradigm, which are also relevant in data science. First of all, there is Mathematica, which is probably one of the best closed-source platforms out there, not just for data science but for any field that involves numeric data. Contrary to what its name suggests, Mathematica is a broad kind of platform having its own programming language built-in; it's not just about Math. Also, its latest version feature A.I. tools, while the person behind this piece of software is a genius scientist who also came up with a novel model for describing the universe. Apart from Mathematica, there is also Matlab, which is still used by made learners of the craft, particularly in academia. Lately, however, its popularity has started to decline, partly because of its open-source clone, Octave, and partly because it pales when compared with modern data science and A.I. platforms that feature better performance and larger communities of users.
All in all, FOSS is paramount in data science work, partly due to the relevance of programming in this field. While new FOSS players come to our field (the most notable of which is Rust, which I covered briefly in the previous article on this blog), chances are that some of them are bound to stay. Things like the Jupyter notebook, for example, aren't going to disappear, even if other code notebooks have entered the scene lately, especially when it comes to the Julia language. In any case, if you want to learn more about the various (mostly open-source) software that populates our fascinating field, you can check out my book Data Science Mindset, Methodologies, and Misconceptions. As a bonus, you can also learn about other aspects of the data science field, such as the marvelous methodologies it features, without getting all too mathy about it! It's been a few years since I authored it, but so far, it's aged quite well, just like most FOSS out there we use in data science and A.I. work. Cheers!
Your comment will be posted after it is approved.
Leave a Reply.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.