Before someone says “yes, of course; you just need to apply a nonlinear transformation to one of the variables!”, let me rephrase: can we measure a nonlinear relationship between two variables, without any transformations whatsoever? In other words, is there a heuristic metric that can facilitate the task of establishing whether two variables are linked in some fashion, without any data engineering from our part? The answer is “yes, of course” again. However, the relationship has to be monotonous for this to work. In other words, there needs to be a 11 relationship between the values of the two variables. Otherwise, it may not appear as strong, due to the nature of nonlinearity. So, if we have two variables x and y, and y is something like x^10 + exp(x), that’s a relationship that is clearly nonlinear, but also monotonous. Also, the Pearson correlation of the two variables in this case is not particularly strong (for the variables tested, it was about 0.67). If it were measured by a different correlation metric, however, like a custombuilt one I’ve recently developed, the relationship would be somewhat stronger (for these variables, it would be around 0.75) while Kendall's ranked correlation coefficient would produce a great result too (1.00 for these variables). In a different scenario, where z = 1 / x, for example, the results of the correlation metrics differ more. Pearson’s correlation in this case would be something like 0.16, while the custommade metric would yield something around 0.69. Also, Kendall’s coefficient would be 1.00. Although the effect is not always pronounced, in cases like this one, a different metric can make the difference between a strong correlation and a notsostrong one, affecting our decisions about the variables. Bottom line, even if the Pearson correlation coefficient is the most popular method for measuring the relationship between two variables, it’s not the best choice when it comes to nonlinear relationships. That’s why different metrics need to be used for evaluating the relationship between two variables, particularly if it’s a nonlinear one.
0 Comments
Leave a Reply. 
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests. Archives
July 2018
Categories
All
