Since I'm in a quiz frame of mind these days, I've created yet another quiz video, which is now available on the O'Reilly platform. Namely, this quiz on Machine Learning vid explores a few key aspects of the subject, such as supervised, unsupervised and reinforcement learning, as well as the main model types and the hyper-parameters involved. Designed to be as inclusive as possible, this is a video that can benefit both the beginner to this topic and the more seasoned machine learning professional. Enjoy!
Note that O'Reilly is a subscription based platform (formerly known as Safari). So, in order to view this or any other video in its entirety, you'll need to have an account there. Definitely a worthwhile investment, if you ask me, particularly if you are a data science professional. I don't receive any benefits from saying this, btw, since I work with a different publisher (Technics Publications), who contributes these videos to this platform.
Beyond the play of words here, there is an important matter that needs to be addressed, since data science is becoming increasingly influential nowadays, in various aspects of our lives. Gone are the days when it was limited to the data science departments of certain companies; these days, the impact of data science transcends the boundaries of the organizations it serves. Take for example the data scientists working for large companies like Facebook and Google. The impact of their work influences a large number of people, even outside the companies themselves. Perhaps the range of this impact is hard to fathom even by the managers of these data science teams since it often has a lasting impact that's nearly impossible to gauge without sufficient data and the time required for this impact to fully manifest.
Ethics is a word that's used so much that has lost its meaning, or maybe it was never really properly defined in the first place. Also, with the impersonal aspects of ethics being formalized in particular codes of conduct, it has lost its essence since it has been reduced to a number of do's and don't, a set of guidelines which can be followed unconsciously and mechanically. However, ethics is the formal aspect of morality, which is founded in the values we follow. The latter is real and oftentimes comprehensible things that we express in our actions, oftentimes consciously. Values like honesty, diligence, and efficiently don't require a Master's in philosophy in order to comprehend, while the ethics of a modern information worker can be a bit more abstract and challenging to relate to. Values are something we have, whether we talk about them or not, and it's not too difficult to figure out what these are with a little introspection. However, even though values are a personal matter, they have a concrete effect on our work and in how we relate to the world. Good managers are aware of that and pay attention to the values of the candidates of the positions they wish to fill. The resume/CV is important but it’s not the only factor at play when hiring a professional.
Perhaps it's time to pay attention to this aspect of the craft more. Knowledge and know-how are becoming more easily accessible to everyone, particularly those who are willing to pay for that, an investment that is guaranteed to pay off. That's great, particularly for those who wish to enter this field even if their education is not aligned with this subject. Still, it's equally important to balance this aptitude with the moral strength that empowers us to deliver our data science work in a way that respects other people's privacy and doesn't abuse the information involved. At one point in our careers, it is natural to come into a crossroad where we need to either do is expected or do what is ethically right. The former is bound to be a more tempting option, at least financially, while the latter may be void of any direct benefit. Having a solid set of positive values may help us make the right choice instead of trading the long-term benefit of the many for the short-term gain of the few.
Just like week, during a business trip to London, I started working on this video, on my spare time, and now it's already online! In this 40 minute video, comprising of 3 clips, I explore the topic of Optimization, through a series of questions spanning across 5 categories. Whether you are an aspiring A.I. expert or a data scientist, you can learn a lot of useful things from this test of sorts and with the right mindset, even enjoy the whole process! You can find it on the O'Reilly platform, where you need to have an account (even a trial one will do) to watch it in its entirety. Cheers!
With everyone in A.I. feeling the need to have an opinion or even a stance on Artificial General Intelligence (AGI), we often neglect the source of this concept. Namely, the well-rounded intelligence that characterizes a human being, having all kinds of smarts. The latter I refer to as Natural General Intelligence (NGI) and someone can argue that it's as important if not more important than AGI, at least in this point in time, particularly to data science professionals.
But isn’t this kind of intelligence another name for genius? Not necessarily. NGI is modeled after the human being in general even if its artificial counterpart (AGI) is often linked to super-intelligence, a kind of supergenius that may characterize an A.I. that has developed this level of intelligence. Still, it is possible to have NGI without being a modern Leonardo DaVinci or a Benjamin Franklin.
Natural General Intelligence is all about enabling your mind to develop in different aspects, not merely the ones that you need for your vocation or the ones that were essential for your survival so far. This idea is not new and has been popular during the Renaissance. Even today we use the term "Renaissance Man" to refer to the individual who is well-rounded in his or her life and can be good at different things. In this era of overspecialization, this seems to be a Utopian endeavor, at least to some people. In reality, however, it isn't. If you want to learn a musical instrument, for example, there are plenty of courses and books you can leverage, while there are even music instructors who can teach you over the internet. As for the instruments themselves, they are far more affordable than they used to be while for certain instruments, the prices continue to drop due to high demand. However, more important than developing one’s musical aptitude is the growth of one’s emotional intelligence (EQ), particularly interpersonal skills.
What does all this have to do with data science? Well, in data science it’s easy to overspecialize too (e.g. in Machine Learning, Data Engineering, NLP, etc.). However, this creates artificial barriers which may render communication with other data professionals more challenging. Of course, more often than not these issues are alleviated through a competent data science lead or a manager with sufficient data science understanding. Still, if you as a data science professional can mitigate the need for external intervention when it comes to collaborating with others, that’s definitely a plus. Not just in terms of smoothing the professional relationships involved, but also in terms of business value. Stand-alone professionals are very sought after since such people tend to be (or quickly become) assets. In time, these professionals can grow into versatilists and/or assume leadership positions.
From all this, it is hopefully clear that Natural General Intelligence is more tangible and significantly more feasible than any other kind of advanced intelligence capable of yielding value in an organization. What's more, an individual with NGI is bound to be more relate-able and accountable, rendering the whole team he/she belongs to a more functional unit. Perhaps such a goal is more beneficial than the blind pursuit of some exotic kind of A.I. that can solve all of our problems. The latter is intriguing and worth investigating, but I wouldn't bet on it benefiting the average Joe any time soon!
Being an expert in this topic since my PhD, I decided to create a video about it. The topic is a bit niche but it's very practical and useful in various data science tasks, particularly data engineering. Check out the video on O'Reilly and feel free to give me any feedback on it, especially regarding the I.D. metric once you look into it. Note that you will need an account on the O'Reilly platform in order to view the video (and any other material) in its entirety. However, considering the quality of the stuff there and the diversity of the content, it is a worthwhile investment. Also, you can have a free trial for 10 days to check it out, before you make a decision about it. Cheers!
In the most venerable of sciences, Physics, there are two closely linked concepts, that of work and that of energy. Work is the result of a force applied over a given distance, while energy is often seen as the result of work. However, energy takes a variety of forms, which enables us to produce work through the use of it, be it through a preexisting form (e.g. uranium and thorium) or some man-made form (e.g. a battery). This fundamental idea of the relationship between work and energy, which we often take for granted, is something that applies to data science as well, by substituting energy for value.
Value is sometimes considered as the 5th V of Big Data (the other four being Volume, Velocity, Variety, and Veracity), something that is quite inaccurate though since value is a fundamental characteristic of information, not a particular kind of data. Information, however, can be found even in relatively small datasets (which were considered large once, before the era of big data), so calling it a characteristic of big data can be misleading. This misconception doesn't take away any value from the idea of value though, which is often a value instilled in many data scientists, particularly those who go beyond the techniques and methods. These data scientists penetrate the essence of the craft, through the development of the data science mindset, which is the most valuable aspect of the field.
Value is something that concerns business people too, however, since it is one of the outcomes of a data science project, which ideally can translate into increased revenue, be it via the development of a new product or by making a business process more efficient. Also, value can enable an organization to expand its scope, know its customers better (KYC), and liaise with other organizations more effectively. This value, which often takes the form of insights, is at the core and oftentimes at the end of the data science pipeline.
Value, however, can take the form of a product, such as an API that automates a particular evaluation process or a prediction. Although the technology behind such a product is nothing spectacular (APIs have existed for a while now and they are fairly straight-forward for a software engineer to develop), the data science part of that product is what brings about the real value in such an API. Without a data science engine behind it, an API is bound to be more of an ETL tool which although still valuable, it's not of the same caliber of data science-powered APIs.
Value in data science is often found in the information distilled from the data, particularly through a predictive analytics model. Elements of it, however, are already encountered in the data discovery stage of the pipeline, where the data scientist evaluates the features at hand and the metadata available. This is often conducted through the creation of data models, which is why it is part of the data modeling part of the pipeline. I talk about all this in detail in the Data Science Modeling Tutorial, available on the O'Reilly (formerly known as Safari) platform.
Value in data science is a big topic and if I were to continue this article would be irksomely long. It would be best if I continue this in another article, or even a series of articles, in the weeks to come. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.