We sometimes find ourselves in situations where no matter what we do, and what model we use, there just isn't anything useful coming out of our analysis. In times like these we wonder if an A.I. system would magically solve the problem. However, it may be the case that there just isn't any signal in the data that we are harvesting.
Of course, this whole thing sounds like a cop-out. It’s easy to say that there is no signal there and throw the towel. However, giving up too quickly is probably worse than not finding a signal there because doing so may eliminating finding something useful in that data ever. That’s why making the decision that there isn’t any signal worth extracting in the data is a tricky thing to do. We must make this decision only after thoroughly examining the data, trying out a variety of feature combinations as well as meta-features, and also experimenting with various models. If after doing all this we still end up with mediocre results that are hard to distinguish from chance, then there probably isn’t anything there, and we can proceed to another project.
However, just because there isn’t a strong enough signal in the data at hand it doesn’t make the whole idea trivial. Maybe there is potential in that idea but we need to pursue it via:
1. more and/or cleaner data like the data we have
2. different kinds of data, to be processed in tandem with the existing data
3. some other application based on that data
The 3rd point is particularly important. Say that we have transaction data, for example, and we want to predict fraud. The data we have is fine, but it is unable to predict anything worthwhile when it comes to fraud. We can still salvage some of the data science work we’ve done though and use it for predicting something else (e.g. some metric for evaluating the efficiency of a transaction, or the general reliability of the network used for these transactions). Just because we cannot predict fraud very well, it doesn’t make the data useless in general.
So, if the data doesn't turn into any viable insights or data products, that’s fine. Not all science experiments end in successful conclusions. We only hear about the success stories in the scientific literature, but for every successful experiment behind these stories there are several other ones that were unsuccessful. As long as we are not daunted by the results and continue working the data, there is always success on the horizon. This success may come about in a somewhat different project though, based on that data. That’s something worth keeping in mind, since it’s really the mindset we have that’s our best asset, even better than our data and our tools.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.