Recently I had a couple of very insightful conversations with some people, over drinks or coffee. We talked about A.I. systems and how they can pose a threat to society. The funny thing is that none of these people were A.I. experts, yet they had a very mature perspective on the topic. This lead me to believe that if non-experts have such concerns about A.I. then perhaps it’s not as niche a topic as it seemed. BTW, the dangers they pinpointed had nothing to do with robots taking over the world through some Hollywood-like scenario, but were far more subtle, just like A.I. itself. Also, they are not about how A.I. can hurt us sometime in the future but how its dangers have already started to manifest. So, I thought about this topic some more, going beyond the generic and quite vague warnings that some individuals have shared with the world over interviews. The main dangers I’ve identified through this quest are the following:
Interestingly, all of these have more to do with us, as people, rather than the adaptive code that powers these artificial mental processes we call A.I.
Over-reliance on A.I.
Let’s start with the most obvious pitfall, over-reliance on this new tech. In a way, this is actually happening to some extent, since many of us use A.I. even without realizing it and have come to depend on it. Pretty much every system that runs on a smart phone that makes the device “smart” is something to watch out for. From virtual assistants to adaptive home screens, to social chatbots, these are A.I. systems that we may get used to so much that we won’t be able to do without. Personally I don’t use any of these, but as the various operating systems evolve, they may not leave users a choice when it comes to the use of A.I. in them.
Degradation of Soft Skills
Soft skills may be something many people talk about and even more have come to value, especially in the workplace. However, with A.I. becoming more and more of a smooth interface for us (e.g. with customer service bots), we may not be as motivated to cultivate these skills. This inevitably leads to their degradation, along with the atrophy of related mental faculties, such as creativity and intuition. After all, if an A.I. can provide us with viable solutions to problems, how can we feel the need to think outside-the-box in order to find them? And if an A.I. can make connecting with others online very easy, why would someone opt for face-to-face connections instead (unless their job dictates that)?
Bugs in Automated Processes
Automated processes may seem enticing through the abstraction they offer, but they are far from perfect. Even the most refined A.I. system may have some hidden issues under the hood, among its numerous hidden layers. Just because it can automate a process, it doesn't mean that there are no hidden biases in its functionality, or some (noticeably) wrong conclusions from time to time. This is natural, since every system is bound to fail at times. The problem is that if an A.I. system fails, we may not be able to correct it, while in some cases even perceiving its bug may be a hard task, let alone proving it to others.
Lack of Direct Experience of the World (VR and AR)
This is probably a bit futuristic, since if you live in a city outside the tech bubble (e.g. the West Coast of the US), there are plenty of opportunities for direct experience still. However, as technologies like virtual reality (VR) and augmented reality (AR) become cheaper and more commercially viable, they are bound to become the go-to interface for the world, e.g. through “tourism” apps or virtual “museums.” Although these technologies would be useful, particularly for people not having easy access to the rest of the world, there is no doubt that they are bound to be abused, resulting to some serious social problems, bringing about further societal fragmentation.
Blind Faith in A.I. Tech
This is probably the worst danger of A.I., which may seem similar to the first one mentioned, though it is more subtle and more sinister. The idea is that some people become very passionate about the merits of A.I. and quite defensive about their views. Their stance on the matter is eerily similar to some religious zealots, though the “prophets” of these A.I. movements may seem level-headed and detached. However, even they often fail to hide their borderline obsession with their ideology, whereby A.I. is deified. It’s one thing speculating about a future society where A.I. may have an administrative role in managing resources, and a completely different thing believing that A.I. will enter our lives and solve all our problems, like some nurturing alien god of sorts.
An Intelligent Approach to All This
Not all is doom and gloom, however. Identifying the dangers of A.I. is a good first step towards dealing with them. An intelligent way to do that is first to take responsibility for the whole matter. It’s not A.I.’s fault that these dangers come about. Just like every technology we've developed, A.I. can be used in different ways. If a car causes thousands of people to die every year it’s not the car’s fault. Also, just like a car was built to enrich our lives, A.I.’s development has similar motives. So, if we see it as an auxiliary technology that can help us make certain processes more efficient, rather than a panacea, we have a good chance of co-existing with it, without risking our individual and social integrity.
Although it's been over 2 weeks since I finished working on the Data Visualization video and about a month since I completed the Deep Learning one, both of them just got made available on Safari (a subscription based platform for various educational material). So, if you are up for some food for thought on DL and DV, check them out when you have a moment: Deep Learning vid and Data Visualization vid.
Note that these are both overview videos and although in the Data Viz one I include several references to libraries in Python and Julia for creating various plots, the videos are fairly high-level. These are not in-depth tutorials on the topics.
Once I decide to take a break from all the book-writing these days, I'll probably make another video either on AI or on a more conventional DS topic. So, stay tuned...
When it comes to DS education, nowadays there is a lot of emphasis given in one of two things: the math aspect of it, and the complex algorithms of deep learning systems. Although all this is essential, particularly if you want to be a future-proof data science professional, there is much more to the field than that. Namely, the engineer mentality is something that you need to cultivate, since at its core, data science is an engineering discipline. I don’t mean that in a software manner, but more of a practicality and efficiency oriented approach to building a system.
This is largely due to the scaling dimension of a data science metric or model. Unfortunately most data science “educators” fail to elaborate on this point, since they focus mainly on parroting other people’s work, instead of inciting students to gain a deeper understanding of the methods and processes being taught. Also, scaling something is the filter that distinguishes a robust algorithm from a mediocre one. As we obtain more and more data, having an algorithm that works well on a small dataset only (or one that requires a great deal of parallelization to yield any benefits), is not sustainable. Of course some people are happy with that, since they have a great deal of resources available, which they are happy to rent out. However, we can often obtain good enough results with less resources, through algorithms that have better scaling. Even if most people don’t share this fox-like approach to data science, it doesn’t make it less relevant. After all, many people associate methods with the frameworks particular companies offer, rather than understand the science behind these methods.
Scaling a method up intelligently is the product of three things:
1. having a deep understanding of a method
2. not relying on an abundance of resources to scale it up
3. being creative about the method, making compromises where necessary, to make it more lightweight
That’s where the engineering mentality comes it. The engineer understands the math, but isn’t concerned about having the perfect solution to a problem. Instead, he cares about having a good enough solution that is reliable and not too costly.
This kind of thinking is what drives the development of modern optimization systems, which are an important part of AI. Artificial Intelligence may involve things like deep learning networks, but there is more to it than that. So, if you want to delve more into this field and its numerous applications in data science, cultivating this engineering mentality is the optimal way to go. Perhaps not the absolute best one, but definitely one that works well and is efficient enough!
I've mentioned both in the DS Modeling Tutorial and in another article of mine the importance of discretization / binning of a continuous variable, as a strategy for turning it into a feature, to be used in a data model. However, how meaningful and information-rich the resulting categorical feature is going to be depends on the thresholds we use. In this post I'd like to share with you a strategy that I've come up with that works well in doing just that.
First of all, we need to make sure we have a potent method for calculating the density of a data point. I'm not talking about probability density though, since the latter is a statistical concept that has more to do with the mathematical form of a distribution than the actual density observed. The actual density is what we would measure if we were to look at the data itself and although it's quite straight-forward, it's not as easy to do at scale. That's why I first developed a very simple (almost simplistic) method for approximating density using a sampling of sorts, rather than looking at each individual element in the variable.
Afterwards, we just need to figure out the point of least density, that's not an extreme of the variable. In other words, identity of a local minimum in the density distribution, a fairly easy task that's also computationally cheap. Of course it's good to have a threshold too, to distinguish between this point being an actual low-density point and one that could be due to chance. If the density of that point is below this threshold, we can take it to be a point of dissection for the variable, effectively binarizing it.
Beyond that, we can repeat the same process recursively, for the two partitions of the variable. This way, we can end up with 3, 4, or even 100 partitions at the end of the process. This is another reason why this aforementioned threshold is very important. After all, not all partitions would be binarizable in a meaningful way. Also, it would be a good idea to have a limit to how many partitions overall we allow, so that we don't end up with a categorical variable having 1000 unique values either!
This optimal discretization / binning process is very simple and robust, resulting into a simpler form of the original variable, one that can be broken down to a set of binary features afterwards, if needed. This can also be useful in identifying potential outliers and being able to use them (as separate values in the new feature) instead of discarding them. The method is made even faster through its implementation in Julia, which once again proved itself as a great DS tool.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.