Why Gradient Descent Is Not the Best Optimizer

11/2/2020

Optimization is the methodology that deals with finding the maximum or the minimum value of a function that's usually referred to as the objective (or fitness) function. It often involves the use of derivatives and calculus techniques, but nowadays it has to do with other, more efficient algorithms, many of which are AI-related. What’s more, optimization plays a crucial role in all modern data science models, particularly the more sophisticated ones (e.g. ANNs, SVMs, etc.).

But what is gradient descent and why is it such a popular optimizer? Gradient descent (GD) is a deterministic optimizer that given a starting point (initial guess) it navigates the solution space based on the gradient of that point. The derivatives are calculated either analytically (through the corresponding function) or empirically through limits of the fitness function. It's similar to descending a valley (or climbing a hill in the case of a maximization problem), by moving towards the part of it near your starting point where it's steeper and always adjusting your course accordingly. Due to its simplicity and high performance, gradient descent is one of the most popular optimizers in its category.

Let’s now look at some (somewhat better) alternatives to GD, deterministic, and otherwise. Let's start with the stochastic ones since they are the ones more commonly used these days. The reason is that most optimization problems involve lots of variables and deterministic optimizers can't handle them, or they take too long to find a solution. Common stochastic optimizers used today include Particle Swarm Optimization (PSO), Simulated Annealing, Genetic Algorithms, Ant Colony Optimization, and Bee Colony Optimization. PSO in particular is quite relevant since other optimizers are often variants of PSO (e.g. the Firefly optimizer). All of these methods are adept at handling complex problems, sometimes with constraints too, outperforming GD. As for deterministic optimizers, there are a couple of them I've developed in the past year, one of which (Divide and Conquer Optimizer) is particularly robust for low-complexity problems (up to 5 variables). The best part is that none of the optimizers mentioned here require the calculation of a derivative, which can be a computationally heavy process (in some cases not even possible).

One thing that’s important to keep in mind and which I cannot stress enough is how optimization is key for data science, esp. in AI-based models. It's something so common that it's hard to imagine any sophisticated machine learning model without an optimization process under the hood. Also, optimizers can be quite useful in data engineering tasks particularly those involving feature selection and other problems involving a large solution space.

You can learn more about optimization and AI in general through a book I have co-authored a couple of years back, through Technics Publications. It's titled AI for Data Science: Artificial Intelligence Frameworks and Functionality for Deep Learning, Optimization, and Beyond, and it's accompanied by code notebooks in Python and Julia. Feel free to check it out. Cheers!

0 Comments

FOXY DATA SCIENCE
unconventional insights about data science, A.I., cybersecurity, data analytics, and more