Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.

It is one of the most popular algorithms to perform optimization and by far the most common way to optimize neural networks.

At the same time, every state-of-the-art Deep Learning library contains implementations of various algorithms to optimize gradient descent (e.g. lasagne’s, caffe’s, and keras’ documentation). These algorithms, however, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by.

Sebastian Ruder gives a thorough introduction into gradient descent, pointing out the three variants of gradient descent (which differ in how much data we use to compute the gradient of the objective function).

## Nominated!

# Siraj Raval

I’d say the gradient descent algorithm has had the biggest impact in deep learning.