Trinary Logic is not something new. It’s been around for decades, though it was more of a mathematical / high-level framework. I should know, as I did my Masters thesis on this subject and how it applies to GIS. I even wrote code implementing the corresponding model I came up with, though in today’s programming world it seems like legacy code... Anyway, bottom line is that Trinary Logic is useful and could have a place in modern Information Systems, including data analytics projects. The question is, could it be applicable to A.I. too?
The answer is, as usual, “it depends.” Trinary Logic on its own is quite limited and unless you are familiar with its 700+ gates, it may be like any novel idea: interesting but not exactly something worth delving into. After all, just like any system of reasoning, Trinary Logic is meaningless without an in-depth understanding of its key contribution to the thorny issue we always tackle through reasoning: handling uncertainty effectively.
Uncertainty, oftentimes modeled as noise or randomness (depending on who you ask), is everywhere. Since we cannot eliminate it without damaging the signal too, we find ways to cope with it. Trinary Logic offers an interesting way of doing that through the 3rd value of its variables, namely the “indifferent” state. Something can be True, False, or Indifferent, the latter being something in-between. These are the states of those intermediate values in the membership functions of fuzzy variables, in Fuzzy Logic. The latter is a well-known and quite established A.I. framework with lots of applications in data science. Do you see where I’m going with this?
So Trinary Logic is a framework for reasoning, much like Fuzzy Logic, but the latter is an A.I. framework too, so Trinary Logic is A.I. also, right? Well, no. Trinary Logic is a mathematical construct, so unless it is applied to A.I. programmatically, and as a well-defined process, it is yet another concept that can’t even fetch an academic publication! But if it were to manifest as a heuristic of sorts and add value to a process in the A.I. sphere, things would be different.
Enter the Trinary Curve, a heuristic (or meta-heuristic, depending on how you use it) that encapsulates Trinary Logic in a simple yet not simplistic way, turning an input signal into something that an A.I. agent can understand and work with. Namely, it can engineer a new variable in the [-1, 1] interval (notice the closed brackets in this case), that enables the corresponding module to have the in-between state of uncertainty more evident. As a result, the A.I. agent is allowed to be unsure about something and examine it more closely, given the right architecture, instead of working with what it has and hope for the best. Note that the Trinary Curve can be customized, while its output can be normalized to a different interval (always closed) if needed. The Trinary Curve is differentiatable throughout the space it is defined, while it’s easy to use programmatically (at least in Julia).
Perhaps the Trinary Curve is a novelty and an A.I. system can evolve adequately without it. However, it is something worth considering, instead of just experimenting with the countable parameters of existing A.I. systems solely. After all, Trinary Logic is compatible with existing A.I. frameworks so if it’s not utilized, it’s primarily because of some people’s unwillingness to think outside the box, and that’s something that doesn’t have any uncertainty about it...
This week I'm away, as I prepare for my talk at the Consumer Identity World EU 2018 conference in Amsterdam (the same conference takes place in a couple of other places, but I'll be attending just the one in Europe). So, if you are in the Dutch capital, feel free to check it out. More information on my talk here. Cheers!
Dichotomy: a binary separation of a set into two mutually exclusive subsets
Data Science: the interdisciplinary field for analyzing data, building models, and bringing about insights and/or data products, which add value to an organization. Data science makes use of various frameworks and methodologies, including (but not limited) to Stats, ML, and A.I.
After getting these pesky definitions out of the way, in an effort to mitigate the chances of misunderstandings, let’s get to the gist of this fairly controversial topic. For starters, all this information here is for educational purposes and shouldn’t be taken as gospel since in data science there is plenty of room for experimentation and someone adept in it doesn’t need to abide to this taxonomy or any rules deriving from it.
The inaccurate dichotomy issues in data science, however, can be quite problematic for newcomers to the field as well as for managers involved in data related processes. After all, in order to learn about this field a considerable amount of time is required, something that is not within the temporal budget of most people involved in data science, particularly those who are starting off now. So, let’s get some misconceptions out of the way so that your understanding of the field is not contaminated by the garbage that roams the web, especially the social media, when it comes to data science.
Namely, there are (mis-)infographics out there that state that Stats and ML are mutually exclusive, or that there is no overlap between non-AI methods and ML. In other words, ML is part of AI, something that is considered blasphemy in the ML community. The reason is simple: ML as a field was developed independently of AI and has its own applications. AI can greatly facilitate ML through its various network-based models (among other systems), but ML stands on its own. After all, many ML models are not AI related, even if AI can be used to improve them in various ways. So, there is an overlap between ML and AI, but there are non-AI models that are under the ML umbrella.
Same goes with Statistics. This proud sub-field of Mathematics has been the main framework for data analytics for a long time before ML started to appear, revolting against the model-based approach dictated by Stats. However, things aren’t that clear-cut. Even if the majority of Stats models are model-based, there are also models that are hybrid, having elements of Stats and ML. Take Bayesian Networks for example, or some variants of the Naive Bayes model. Although these models are inherently Statistical, they have enough elements of ML that they can be considered ML models too. In other words, they lie on the nexus of the two sets of methods.
What about Stats and AI? Well, Variational AutoEncoders (VAEs) are an AI-based model for dimensionality reduction and data generation. So, there is no doubt that it lies within the AI set. However, if you look under the hood you’ll see that it makes use of Stats for the figuring out what the data generated by it would be like. Specifically, it makes use of distributions, a fundamentally statistical concept, for the understanding and the generation of the data involved. So, it wouldn’t be far-fetched to put VAEs in the Stats set too.
From all this I hope it becomes clear that the taxonomy of data science models isn’t that rigid as it may seem. If there was a time when this rigid separation of models made sense, this time is now gone as hybrid systems are becoming more and more popular, while at the same time the ML field expands in various directions outside AI. So, I’d recommend you take those (mis-)infographics with a pinch of salt. After all, most likely they were created by some overworked employee (perhaps an intern) with a limited understanding of data science.
Interestingly, the video throughput on Safari has increased lately so we don't have to wait too long before a video gets approved and published. This little guy, for example, I just finished on Thursday and it's already online at the Safari platform. It's by no means an exhaustive survey of the ML field, which is much larger than many people think and it doesn't include A.I. methods only. This video is more of an overview of ML and how it relates to other aspects of Data Science, such as Statistics, A.I., and various applications. So, if you are new to Data Science or want to get a comprehensive overview of the topic to supplement your studies of the subject, feel free to check it out!
With all the plethora of material out there for data science education, it is easy to get overwhelmed and even confused about what to study and how much time, money, and effort to put into it. Enter evaluation of data science material, a concise strategy for tackling this issue. In this 24 minute video, I talk about the various aspects of data science material, criteria for evaluating it, the matter of resources required to delve into this material, and some useful things to have in mind in your data science education efforts. Whether you are a newcomer to the field or a more seasoned data scientist, you have something to learn about data science (I know I do!) and this video can hopefully aid you in that. You can find it on Safari.
Note that in order to be able to view this video in its entirety, you'll need a subscription for the Safari platform. Also, it's important to remember that this video can offer you a framework for evaluating the data science material; you'll still need to find that material though and put the effort to study it, in order to make the most of it. The video can only help you organize your efforts more efficiently. Enjoy!
When a machine learning predictive analytics systems makes predictions about something chances are that we have some idea of what drove it to make these predictions. Oftentimes, we even know how confident the system is about each prediction, something that helps us become confident about it too. However, most A.I. systems (including all modern AIs used in data science) fail in giving us any insight as to how they arrived at their results. This is known as the black box problem and it’s one of the most challenging issues of network-based AIs.
Although things are hopeless for this kind of systems due to their inherent complexity and lack of any order behind their predictive magic, it doesn’t mean that all AIs need to be under the same umbrella. Besides, the A.I. space is mostly unexplored even if often seems to be a fully mapped terrain. Without discounting the immense progress that has been made in network-based systems and their potential, let’s explore the possibility of a different kind of A.I. that is more transparent.
Unfortunately, I cannot be very transparent about this matter as the tech is close-source, while the whole framework it is based on is so far beyond conventional data analytics that most people would have a hard time making sense of it all. So, I’ll keep it high-level enough so that everyone can get the gist of it.
The Rationale heuristic is basically a way analyzing a certain similarity metric to its core components, figuring out how much each one of them contributes to the end result. The similarity metric is non-linear and custom-made, with various versions of it to accommodate different data geometries. As for the components, if they are the original features, then we can have a way to directly link the outputs (decisions) with the inputs, for each data point predicted. By examining each input-output relationship and applying some linear transformation to the measures we obtain, we end up with a vector for each data point, whose components add up to 1.
Naturally, the similarity metric needs to be versatile enough to be usable in different scenarios, while also able to handle high dimensionality. In other words, we need a new method that allows us to process high-dimensional data, without having to dumb it down through a series of meta-features (something all network-based AIs do in one way or another). Of course, no-one is stopping you from using this method with meta-features, but then interpretability goes out the window since these features may not have any inherent meaning attached to them. Unless of course you have generated the meta-features yourself and know how everything is connected.
“But wait,” you may say, “how can an A.I. make predictions with just a single layer of abstraction, so as to enable interpretability through the Rationale heuristic?” Well, if we start thinking laterally about it we can also try to make A.I. systems that emulate this kind of thinking, exhibiting a kind of intuition, if you will. So, if we do all that, then we wouldn’t need to ask this question at all and start asking more meaningful questions such as: what constitutes the most useful data for the A.I. and how can I distill the original dataset to provide that? Because the answer to this question would render any other layer-related questions meaningless.
As someone once said, "Knowledge is having the right answer; Intelligence is asking the right question." So, if we are to make truly intelligence systems, we might want to try acting as intelligent beings ourselves...
In many data science courses, these peculiar data points in a dataset often go by the term “anomalies” and are considered to be inherently bad. In fact, it is suggested by many that they be removed before the data modeling stage. Now, for obvious reasons I cannot contradict that approach partly because I myself have taken that stance when covering basic data engineering topics, but also because there is merit in this treatment of outliers and inliers. After all, they are just too weird to be left as they are, right?
Well, it depends. In all the cases when they are removed, it’s usually because we are going to use some run-of-the-mill model that is just too rudimentary to do anything clever with the data it’s given. So, if there are anomalous data points in the training set, it’s likely to over-fit or at the very least under-perform. This would not happen so often though in an A.I. model, which is one of the reasons why the data engineering stage is so closely linked to the data modeling one. Also, sometimes the signal we are looking for lies in those particular anomalous elements, so getting rid of them isn’t that wise then.
Regardless of all that though, we need to differentiate between these two kinds of anomalies. The outliers can be easily smoothed out, if we were to adopt a possibilistic way of handing the day, instead of the crude statistical metrics we are used to using. Smoothing outliers is also a good way to retain more signal in the dataset (especially if it’s a small sample that we are working with), something that translates into better-performing models.
Inliers though are harder to process. Oftentimes removing them is the best strategy, but they need to be looked at holistically, not just in individual variables. Also, even if they distort the signal at times, they may not be that harmful when doing dimensionality reduction, so keeping them in the dataset may be a good idea. Nevertheless, it’s good to make a note of these anomalous elements, as they may have a particular significance once the data is processed by a model we build. Perhaps we can use them as fringe cases in a classification model, for example, to do some more rigorous testing to it.
To sum up, outliers and inliers are interesting data points in a dataset and whether they are more noise than signal depends on the problem we are trying to solve. When tackled in a multi-dimensional manner, they can be better identified, when when processed, certain care needs to be taken. After all, just because certain data analytics methods aren’t well-equipped to handle them, we shouldn’t change our data to suit the corresponding models / metrics. Often we have more to gain by shifting our perspective and adapting our ways to the data at hand. The possibilistic approach to data may be a great asset in all that. Should you wish to learn more about outlier and inliers, you can check out my presentation video on this topic in the Safari platform.
Contrary to the probabilistic approach to data analytics, which relies on probabilities and ways to model them, usually through a statistical framework, the possibilistic approach focuses on what’s actually there, not what could be there, in an effort to model uncertainty. Although not officially a paradigm (yet), it has what it takes to form a certain mindset, highly congruent with that of a competent data scientist.
If you haven’t heard of the possibilistic approach to things, that’s normal. Most people have already jumped on the bandwagon of the probabilistic dogma, so someone seriously thinking of things possibilistically would be considered eccentric at best. After all, the last successful possibilistic systems are often considered obsolete, due to their inherent limitations when it came to higher dimensionality datasets. I’m referring to the Fuzzy Logic systems, which are part of the the GOFAI family of A.I. systems (in these systems the possibilities are expressed as membership levels, through corresponding functions). These systems are still useful, of course, but not the go-to choice when it comes to building an AI solution to most modern data science problems.
Possibilistic reasoning is that which relies on concrete facts and observable relationships in the data at hand. It doesn’t assume anything, nor does it opt for shortcuts by summarizing a variable with a handful of parameters corresponding to a distribution. So, if something is predicted with a possibilistic model, you know all the how’s and why’s of that prediction. This is directly opposite to the black-box predictions of most modern AI systems.
Working with possibilities isn’t easy though. Oftentimes it requires a lot of computational resources, while an abundance of creativity is also needed, when the data is complex. For example, you may need to do some clever dimensionality reduction before you can start looking at the data, while unbiased sampling may be a prerequisite also, particularly in transduction-related systems. So, if you are looking for a quick-and-easy way of doing things, you may want to stick with MXNet, TensorFlow, or whatever A.I. framework takes your fancy.
If on the other hand you are up for a challenge, then you need to start thinking in terms of possibilities, forgetting about probabilities for the time being. Some questions that may help in that are the following:
* How much does each data point contribute to a metric (e.g. one of central tendency or one of spread)?
* Which factors / features influence the similarity between two data points and by how much?
* What do the fundamental components of a dataset look like, if they are defined by both linear and non-linear relationships among the original features?
* How can we generate new data without any knowledge of the shape or form of the original dataset?
* How can we engineer the best possible centroids in a K-means-like clustering framework?
* What is an outlier or inlier essentially and how does it relate to the rest of the dataset?
For all of these cases, assume that there is no knowledge of the statistical distributions of the corresponding variables. In fact, you are better off disregarding any knowledge of Stats whatsoever, as it’s easy to be tempted to use a probability-based approach.
Finally, although this new way of thinking about data is fairly superior to the probabilistic one, the latter has its uses too. So, I’m not advocating that you shouldn’t learn Stats. In fact, I’d argue that only after you’ve learned Stats quite well, will you be able to appreciate the possibilistic approach to data in full. So, if you are looking into A.I., Machine Learning, or both, you may want to consider a possibilistic way of tackling uncertainty, instead of blindly following those who have vested interests in the currently dominant paradigm.
Although I’ve talked about dimensionality reduction for data science in the corresponding video on Safari, covering various angles of the topic, I was never fully content with the methodologies out there. After all, all the good ones are fairly sophisticated, while all the easier ones are quite limited. Could there be a different (better) way of performing dimensionality reduction in a dataset? If so, what issue would such a method tackle?
First of all, conventional dimensionality reduction methods tend to come from Statistics. That’s great if the dataset is fairly simple, but methods like PCA focus on the linear relationships among the features, which although it’s a good place to start, it doesn’t cover all the bases. For example, what if features F1 and F2 have a non-linear relationship? Will PCA be able to spot that? Probably not, unless there is a strong linear component to it. Also, if F1 and F2 follow some strange distribution, the PCA method won’t work very well either.
What's more, what if you want to have meta-features that are independent to each other, yet still explain a lot of variance? Clearly PCA won’t always give you this sort of results, since for complex datasets the PCs will end up being tangled themselves. Also, ICA, a method designed for independent components, is not as easy to use since it’s hard to figure out exactly where to stop when it comes to selecting meta-features.
In addition, what’s the deal with outliers in the features? Surely they affect the end result, by changing the whole landscape of the features, breaking the whole scale equilibrium at times. Well, that’s one of the weak point of PCA and similar dimensionality reduction methods, since they require some data engineering before they can do their magic.
Finally, how much does each one of the original features contribute to the meta-features you end up with after using PCA? That’s a question that few people can answer although the answer is right there in front of them. Also, such a piece of information may be useful in evaluating the original features or providing some explanation of how much they are worth in terms of predictive potential, after the meta-features are used in a model.
All of these issues and more can be tackled by using a new approach to dimensionality reduction, one that is based on a new paradigm (the same one that can tackled the clustering issues mentioned in the previous post). Also, even though the new approach doesn’t use a network architecture, it can still be considered a type of A.I. as there is some kind of optimization involved. As for the specifics of the new approach, that’s something to be discussed in another post, when the time is right...
I was never particularly fond of this unsupervised learning methodology that’s under the umbrella of machine learning. It’s not that I didn’t see value in it, but the methods that were available for it when I started delving into it were rudimentary at best and fairly crude. In fact, if I were to do a PhD now, I’d choose a clustering-related topic since there is so much room for improvement that even a simple idea for improving the most popular clustering methods out there is bound to improve them!
However, the fact that data science researchers and machine learning engineers in particular haven’t spent much time looking into clustering doesn’t make clustering a bad methodology. In fact, I’d argue that it’s one of the most insightful ones and it plays an important role in many data science projects, particularly in the data exploration stage.
The key issues with clustering are:
1. The whole set of distance metrics used
2. The fact that the vast majority of clustering methods yield a (slightly) different result every time they are run
3. The need of an external parameter (K) in most clustering methods used in practice, in order to define how many clusters there are
4. The fact that it’s very shallow in its results
There may be more issues with clustering, but these are the most important ones I’ve found. So, if we were to rethink clustering and do it better, we’d need to address each one of these issues. Namely:
1. A new set of distance metrics would be needed, metrics that are not influenced by the dimensional “noise” so much, in the case of many dimensions in the dataset.
2. The option for a deterministic clustering method, one that would optimize the centroid seed before starting the whole clustering process
3. An optimization process would be in place so as to find the best number of clusters. This should include the possibility of a single cluster, in the case where there isn’t enough diversity in the dataset.
4. A multi-level clustering option needs to be available, much like hierarchical clustering but in reverse, i.e. start with the main clusters in the dataset and gradually dig deeper into levels of sub-clusters.
Now, all this may sound simple but it’s not as easy to put into practice. Apart from an in-depth understanding of data science, a quite refined programming ability is needed too, so that the implementation of this clustering approach can be efficient and scalable. Perhaps all this is not even possible with the conventional data analytics framework, but there is not a single doubt in my mind that it is possible in general, and if a high-performance language is used (e.g. Julia), it is even practically feasible.
Naturally, a clustering framework like this one would require a certain level of A.I. to be used. This doesn’t have to be an ANN though, since A.I. can take many forms, not just network-based ones. Whatever the case, conventional statistics-based methods may be largely inadequate, while the very basic machine learning methods for clustering may not be sufficient either.
This illustrates something that many data science practitioners have forgotten: that data science methods evolve, just like other aspects of the craft. New tools may be intriguing, but equally intriguing are the conventional methodological tools, especially if we were to rethink them from a more advanced perspective. This can be beneficial in many ways, such as opening new avenues of data analytics and even synthesizing new data. This, however, is a story for another time...
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.