Contrary to the probabilistic approach to data analytics, which relies on probabilities and ways to model them, usually through a statistical framework, the possibilistic approach focuses on what’s actually there, not what could be there, in an effort to model uncertainty. Although not officially a paradigm (yet), it has what it takes to form a certain mindset, highly congruent with that of a competent data scientist.
If you haven’t heard of the possibilistic approach to things, that’s normal. Most people have already jumped on the bandwagon of the probabilistic dogma, so someone seriously thinking of things possibilistically would be considered eccentric at best. After all, the last successful possibilistic systems are often considered obsolete, due to their inherent limitations when it came to higher dimensionality datasets. I’m referring to the Fuzzy Logic systems, which are part of the the GOFAI family of A.I. systems (in these systems the possibilities are expressed as membership levels, through corresponding functions). These systems are still useful, of course, but not the go-to choice when it comes to building an AI solution to most modern data science problems.
Possibilistic reasoning is that which relies on concrete facts and observable relationships in the data at hand. It doesn’t assume anything, nor does it opt for shortcuts by summarizing a variable with a handful of parameters corresponding to a distribution. So, if something is predicted with a possibilistic model, you know all the how’s and why’s of that prediction. This is directly opposite to the black-box predictions of most modern AI systems.
Working with possibilities isn’t easy though. Oftentimes it requires a lot of computational resources, while an abundance of creativity is also needed, when the data is complex. For example, you may need to do some clever dimensionality reduction before you can start looking at the data, while unbiased sampling may be a prerequisite also, particularly in transduction-related systems. So, if you are looking for a quick-and-easy way of doing things, you may want to stick with MXNet, TensorFlow, or whatever A.I. framework takes your fancy.
If on the other hand you are up for a challenge, then you need to start thinking in terms of possibilities, forgetting about probabilities for the time being. Some questions that may help in that are the following:
* How much does each data point contribute to a metric (e.g. one of central tendency or one of spread)?
* Which factors / features influence the similarity between two data points and by how much?
* What do the fundamental components of a dataset look like, if they are defined by both linear and non-linear relationships among the original features?
* How can we generate new data without any knowledge of the shape or form of the original dataset?
* How can we engineer the best possible centroids in a K-means-like clustering framework?
* What is an outlier or inlier essentially and how does it relate to the rest of the dataset?
For all of these cases, assume that there is no knowledge of the statistical distributions of the corresponding variables. In fact, you are better off disregarding any knowledge of Stats whatsoever, as it’s easy to be tempted to use a probability-based approach.
Finally, although this new way of thinking about data is fairly superior to the probabilistic one, the latter has its uses too. So, I’m not advocating that you shouldn’t learn Stats. In fact, I’d argue that only after you’ve learned Stats quite well, will you be able to appreciate the possibilistic approach to data in full. So, if you are looking into A.I., Machine Learning, or both, you may want to consider a possibilistic way of tackling uncertainty, instead of blindly following those who have vested interests in the currently dominant paradigm.
It’s not the programming language, as some people may think. After all, if you know what you are doing, even a suboptimal language could be used without too much of an efficiency compromise. No, the biggest mistake people make, in my experience, is that they rely too much on libraries they find as well as the methods out there. This is not the worst part though. If someone relies excessively on predefined processes and methods, the chances of that person’s role getting automated by an A.I. are quite high. So, what can you do?
For starters, one needs to understand that both data science and artificial intelligence, like other modern fields, are in a state of flux. This means that what was considered gospel a few years back may be irrelevant in the near future, even if it is somewhat useful right now. Take Expert Systems, for example. These were all the rage during the time when A.I. came out as an independent field. However, nowadays, they are hardly used and in the near future, they may appear more anachronistic than ever before. That’s not to say that modern aspects of data science and A.I. are going to wane necessarily, but if one focuses too much on them, at the expense of the objective they are designed for, that person risks becoming obsolete as they become less relevant.
Of course, certain things may remain relevant no matter what. Regardless of how data science and A.I. evolve, the k-fold cross-validation method will be useful still. Same goes with certain evaluation metrics. So, how do you discern what is bound to remain relevant from what isn’t? Well, you can’t unless you try to innovate. If certain methods appear too simple, for example, they may not stick around for much longer, even if they linger in the textbooks. Do these methods have variants already that outperform the original algorithms? Are people developing similar methods to overcome drawbacks that they exhibit? What would you do if you were to improve these methods? Questions like this may be hard to answer because you won’t find the necessary info on Wikipedia or on StackOverflow, but they are worth thinking about for sure, even if an exact answer may elude you.
For example, I always thought that clustering had to be stochastic because everyone was telling me that it is an NP-hard problem that cannot be solved efficiently with a deterministic method. Well, with this mindset no innovations would ever take place in that method of unsupervised learning, would it? So, I questioned this matter and found out that not only are there ways to solve clustering in a deterministic way, but some of these methods are more stable than the stochastic ones. Are they easy? No. But they work. So, just like we tend to opt for mechanized transportation today, instead of the (much simpler) horse and carriage alternative, perhaps the more sophisticated clustering methods will prevail. But even if they don’t (after all, there are no limits to some people’s detest towards something new, especially if it’s difficult for them to understand), the fact that I’ve learned about them enables me to be more flexible if this change takes place. At the same time, I can be more prepared for other changes in the field, of a similar nature.
I am not against stochastic methods, by the way, but if an efficient deterministic solution exists for a problem, I see no reason why we should stick with a stochastic approach to that problem. However, for optimization related scenarios, especially those involving very complex problems, the stochastic approach may be the only viable option. Bottom line, we need to be flexible about these matters.
To sum up, learning about the conventional way of solving data-related problems, be it through data science methods, or via A.I. ones, is but the first step. Stopping there though would be a grave mistake, since you’d be depriving yourself the opportunity to delve deeper into the field and explore not only what’s feasible but also what’s possible. Isn’t that what science is about?
There is no doubt that Artificial Intelligence has a number of issues that need to be addressed before its benefits can become more wide-spread. Also, if it were to become more autonomous, we would need to be able to at least anticipate its decisions and perhaps even understand how they come about. However, none of these things have proven to be happening yet. Whether that’s due to some innate infeasibility or due to some other factor is yet to be discovered.
What we have discovered though, again and again, is that most A.I. developments take the world by surprise. Even the people involved in this field, dedicated scientists and engineers who have spent countless hours working with such systems. However, our collective understanding of them still eludes us and it’s not the A.I.’s fault.
It’s easy to blame an A.I. or the people behind it for anything that goes wrong, but remember that various A.I. projects were seen to their completion because we as potential users of them wanted them out there. Whether we understood the implications of these systems or not though is questionable.
So, the biggest issue of A.I. might be how we relate to it, combined with the fact that we don’t really understand it in depth. The evangelists of the field view it as a panacea of sorts, oftentimes confusing A.I. with ML, while often considering the latter as a subfield of the former. On the other hand, the technical people involved in A.I. see it as a cool technology that can keep them relevant in the tech market. As for the consumers of A.I., they see it as a cool futuristic tech that may make life more interesting, though it may also change the dynamics of the job market in very disruptive (or even disturbing) ways. Unless, we all obtain a more clear understanding of what A.I. is, what it can and cannot do, and how it works (to the extent each person’s technical level allows), A.I. will remain an exotic technology wrapped in a mist of mystique.
That’s not an unsurmountable problem though. Nowadays, knowledge is more accessible than ever before, so if someone wants to learn about A.I. more, it’s just a matter of committing to that task and putting the hours necessary. Granted that sometimes a few books or videos would be needed too, with whatever cost this entails, still the task is a quite manageable one. Besides, one doesn’t need to be an A.I. expert in order to have sensible expectations of this tech and be able to discern the brilliance of some such systems from the BS of many of the futurists.
All in all, the more one knows about this field and the more realistic his or her expectations are, the better the chances of deriving value from A.I., without falling victim of the problems that surround it.
So, the NLP Fundamentals video I made recently is online as of today (you can find it on the Safari site). Note that since Natural Language Processing is a very broad subject, it is quite hard to do it justice in a single video. However, for someone needing a good introduction to it, this video should be fine. Enjoy!
A few months ago, I wrote a blog post on Artificial Emotional Intelligence, a kind of A.I. that emulates the EQ aspects of our mental process. Of course this technology is still limited to emulating basic aspects of the human emotional spectrum, focusing mainly on comprehending emotion through text data. Nevertheless, this can still add a lot of value to an organization, as in the case of ZimGo Polling, an initiative to predict the outcome of an election, prior to the counting of the votes.
BPU Holdings, the company behind this ambitious yet quite down-to-earth initiative, make use of an advanced NLP system, employing some specialized AI systems (interestingly I’m currently in the process of creating a video on the topic of NLP, for Safari!). Also, the data for this endeavor stems from social media, something that ensures abundance as well as freshness, both key factors in making good predictions on this sort of trends.
Since part of my job at DSP Ltd. is being up-to-date on the latest and greatest trends in data science and A.I., when a representative of BPU Holdings approached me with their AEI ideas, I took the opportunity to learn more about this field and about what they were doing. That’s how that previous blog post I mentioned came about. Last month, I had the opportunity to talk to the people of this company directly, learning more about their work and AEI’s promise of bringing more value to organizations around the world, through this intriguing niche. After looking into this matter a bit, I became convinced that this company may be actually on to something.
The case study presented to me involved the S. Korean elections, where this AEI system managed to predict the results with impressive accuracy. Of course, the company doesn’t plan to rest at its laurels, as there are already plans to apply this new approach to data analytics to other areas, such as the US elections. You can read more about this as well as the company’s offerings in the attached press release as well as its website.
Note that I am not affiliated with this company, so if I appear a bit biased towards it that’s because I favor the use of A.I. for such initiatives, rather than other, more aggressive applications, such as those in the military. After all, if there is one thing that I hope has come across from all my postings on this topic, it is that A.I. can be a positive tech, bringing about value to everyone, not just some multinational conglomerates that may not always use it wisely. Also, instead of following blindly this or the other A.I. expert on the social media, I prefer to take a more active approach to this matter by directly connecting with the people involved and providing them with feedback on this tech, as they are developing it. That’s why this blog is the first one worldwide to publicly announce this company’s initiative and bring AEI to the conversation table.
What are your thoughts / emotions on it? Feel free to share them with me, either through this blog or via a direct message.
A.I. and ML are often used interchangeably, while many people consider one to be a subset of the other (which one is the bigger set depends on who you ask). However, things may not be as clear-cut as they may seem, since the communities of these two fields are not all the related, while there is a sort of rivalry among the hard-core members of each one of them. Why is that though if A.I. and ML are so similar to each other, enough to confuse even data scientists?
First of all, let’s start with some definitions. A.I. is the group of methods, algorithms, and processes, that bring about computer systems that emulate human intelligence, even if the intelligence they usually exhibit is quite different to our own. Also, these systems often take the form of self-sufficient machines, such as robots, as well as agent programs that roam the Internet or cyber space in general. ML on the other hand is the group of methods, algorithms, and processes that bring about computer systems that solve some data analytics problem in an efficient manner, through some training procedure (the learning part of machine learning). The latter can be with the help of some specific outcomes (aka targets) or without. Also, the training can take the form of feedback on the system’s predictions, which is like on-the-job training of sorts.
Clearly, there is a close link between ML and data science, since ML systems are designed for this sort of problems. A.I. systems on the other hand, may tackle different kinds of problems too (e.g. finding the optimal route given some restrictions). So, there is a part of A.I. that is leveraged in data science and a part of A.I. that has nothing to do with our craft. That part of A.I. that is used in data science has a large intersect with ML, mainly through network-based systems, such as ANNs. Lately, Deep Learning networks, which are specialized and more sophisticated kinds of ANNs, have become quite popular and are also part of that intersect between A.I. and ML.
Many people who work in A.I. consider it more of a science than ML and they are right in a way. Most of ML methods are heuristics based and don’t have much theory behind them, while the ones that are tied to Stats (Statistical and ML hybrids) are heavily restrained by the assumptions that the Stats theory has. A.I. methods are generally data-driven though, but also related to processes found in nature, so they are not out of the blue.
Nevertheless, a data scientist who is being professional and pragmatic doesn’t put too much emphasis on the differences between A.I. and ML methods, since he cares more about how they can be applied to solve the problems at hand. So, even if these two families of methods are not the same, nor is one a subset of the other, they are both very useful, if not essential, in practical data science.
I understand that making predictions about these things is quite risky, but it’s good to take a stance about the things that matter, instead of playing it safe, like many tech “experts” out there do. Of course it’s easier to parrot the widely accepted views on every hot topic, gathering “likes” and positive comments, but no-one ever offered anything useful to the whole by being all lukewarm.
First of all, I’m not making a case against cryptocurrencies as a possibility. In fact, I find them immensely useful in potential, especially in a country where the conventional currency is plagued by inflation and by all the idiotic people managing the economy around it. Cryptocurrencies can be a viable alternative to the official currency, should they be used instead of a problematic fiat currency. The reality of cryptocurrencies, however, is very distant from this idealistic scenario. In fact, I’ve yet to encounter one cryptocurrency that is actually used as a currency of sorts. Most of them are some form of speculative investment, like a stock, but without any inherent value. Let that sink in for a bit; cryptocurrencies themselves have no value whatsoever.
Someone may argue that conventional currencies have no inherent value either, and that’s a valid point. However, conventional currencies’ value doesn’t fluctuate wildly over time, since there are mechanisms to keep it somewhat stable. Naturally, there are exceptions, but even an unstable currency is generally more stable than the average cryptocurrency out there. The reason is simple: people who handle cryptocurrencies do so with one particular aim: to make money off them. They don’t care if they disappear tomorrow, as long as they cash them in first. It doesn’t take a financial genius to understand that this sort of ecosystem is not sustainable. The other reason is a bit more subtle than that, yet equally important. Most cryptocurrencies require someone to constantly work for them (a process known as mining), or provide some sort of infrastructure that’s not cheap to maintain. The translates into a running cost, which may not seem much individually, but collectively it is a lot, enough to make the whole system unsustainable. This is particularly true in cases like bitcoin, where the computational problems needed to be solved to maintain the blockchain behind the cryptocurrency get progressively more challenging, and therefore more expensive. Once enough people realize that, the fascination of these cryptocurrencies may wane, especially if some regulating mechanism comes into place.
Artificial Intelligence on the other hand is a completely different animal. Even the most basic applications of it add value to whoever invests in them, be it someone tackling big data problems, or someone who just wants to optimize their technical infrastructure. Through a vast variety of ways, A.I. manages to add value to the people using it, particularly if they have developed it sufficiently, thereby automating certain expensive processes. That’s why people are amazed by it and spend hours speculating how it can help bring about numerous benefits to the world. Even if there are some inevitable pitfalls in this technology, if it is handled maturely, it can be of great benefit for the whole. Besides, as a scientific field it existed and it flourished on its own, long before the futurists used it for promoting their ideology, or before it became mainstream.
Hopefully it won’t be long before the cryptocurrency craze subsides and people who waste their time and energy on it focus their efforts into something more sustainable, something that adds value to its environment rather than drain resources and time. Perhaps this could be A.I., or some other similar technology. Whatever the case, the cryptocurrencies that are around today have an expiration date, whether people are willing to accept that or not...
After investigating this topic quite a bit, as I was looking into A.I. stuff, I decided to create a video on it. To make it more complete, I included other methods too, such as Statistics-based and heuristics-based ones. Despite the excessive amount of content I put together into this project (the script was over 4000 words), I managed to keep the video at a manageable length (a bit less than half an hour). Check it out on Safari when you have some time!
Recently I had a couple of very insightful conversations with some people, over drinks or coffee. We talked about A.I. systems and how they can pose a threat to society. The funny thing is that none of these people were A.I. experts, yet they had a very mature perspective on the topic. This lead me to believe that if non-experts have such concerns about A.I. then perhaps it’s not as niche a topic as it seemed. BTW, the dangers they pinpointed had nothing to do with robots taking over the world through some Hollywood-like scenario, but were far more subtle, just like A.I. itself. Also, they are not about how A.I. can hurt us sometime in the future but how its dangers have already started to manifest. So, I thought about this topic some more, going beyond the generic and quite vague warnings that some individuals have shared with the world over interviews. The main dangers I’ve identified through this quest are the following:
Interestingly, all of these have more to do with us, as people, rather than the adaptive code that powers these artificial mental processes we call A.I.
Over-reliance on A.I.
Let’s start with the most obvious pitfall, over-reliance on this new tech. In a way, this is actually happening to some extent, since many of us use A.I. even without realizing it and have come to depend on it. Pretty much every system that runs on a smart phone that makes the device “smart” is something to watch out for. From virtual assistants to adaptive home screens, to social chatbots, these are A.I. systems that we may get used to so much that we won’t be able to do without. Personally I don’t use any of these, but as the various operating systems evolve, they may not leave users a choice when it comes to the use of A.I. in them.
Degradation of Soft Skills
Soft skills may be something many people talk about and even more have come to value, especially in the workplace. However, with A.I. becoming more and more of a smooth interface for us (e.g. with customer service bots), we may not be as motivated to cultivate these skills. This inevitably leads to their degradation, along with the atrophy of related mental faculties, such as creativity and intuition. After all, if an A.I. can provide us with viable solutions to problems, how can we feel the need to think outside-the-box in order to find them? And if an A.I. can make connecting with others online very easy, why would someone opt for face-to-face connections instead (unless their job dictates that)?
Bugs in Automated Processes
Automated processes may seem enticing through the abstraction they offer, but they are far from perfect. Even the most refined A.I. system may have some hidden issues under the hood, among its numerous hidden layers. Just because it can automate a process, it doesn't mean that there are no hidden biases in its functionality, or some (noticeably) wrong conclusions from time to time. This is natural, since every system is bound to fail at times. The problem is that if an A.I. system fails, we may not be able to correct it, while in some cases even perceiving its bug may be a hard task, let alone proving it to others.
Lack of Direct Experience of the World (VR and AR)
This is probably a bit futuristic, since if you live in a city outside the tech bubble (e.g. the West Coast of the US), there are plenty of opportunities for direct experience still. However, as technologies like virtual reality (VR) and augmented reality (AR) become cheaper and more commercially viable, they are bound to become the go-to interface for the world, e.g. through “tourism” apps or virtual “museums.” Although these technologies would be useful, particularly for people not having easy access to the rest of the world, there is no doubt that they are bound to be abused, resulting to some serious social problems, bringing about further societal fragmentation.
Blind Faith in A.I. Tech
This is probably the worst danger of A.I., which may seem similar to the first one mentioned, though it is more subtle and more sinister. The idea is that some people become very passionate about the merits of A.I. and quite defensive about their views. Their stance on the matter is eerily similar to some religious zealots, though the “prophets” of these A.I. movements may seem level-headed and detached. However, even they often fail to hide their borderline obsession with their ideology, whereby A.I. is deified. It’s one thing speculating about a future society where A.I. may have an administrative role in managing resources, and a completely different thing believing that A.I. will enter our lives and solve all our problems, like some nurturing alien god of sorts.
An Intelligent Approach to All This
Not all is doom and gloom, however. Identifying the dangers of A.I. is a good first step towards dealing with them. An intelligent way to do that is first to take responsibility for the whole matter. It’s not A.I.’s fault that these dangers come about. Just like every technology we've developed, A.I. can be used in different ways. If a car causes thousands of people to die every year it’s not the car’s fault. Also, just like a car was built to enrich our lives, A.I.’s development has similar motives. So, if we see it as an auxiliary technology that can help us make certain processes more efficient, rather than a panacea, we have a good chance of co-existing with it, without risking our individual and social integrity.
Although it's been over 2 weeks since I finished working on the Data Visualization video and about a month since I completed the Deep Learning one, both of them just got made available on Safari (a subscription based platform for various educational material). So, if you are up for some food for thought on DL and DV, check them out when you have a moment: Deep Learning vid and Data Visualization vid.
Note that these are both overview videos and although in the Data Viz one I include several references to libraries in Python and Julia for creating various plots, the videos are fairly high-level. These are not in-depth tutorials on the topics.
Once I decide to take a break from all the book-writing these days, I'll probably make another video either on AI or on a more conventional DS topic. So, stay tuned...
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.