RMSProp Optimizer for Neural Networks

RMSProp Optimizer for Neural Networks


Continuing with Stochastic Gradient Descent adaptations, we reach RMSProp, short for Root Mean Square Propagation. Similar to AdaGrad, RMSProp calculates an adaptive learning rate per parameter; it’s just calculated in a different way than AdaGrad.

Optimizers with live results:


Stochastic Gradient Descent:

Optimizer: SGD. Learning Rate: 1.0.

Optimizer: SGD. Learning Rate: 0.5.

Optimizer: SGD. Learning Rate: 1.0. Decay: 1e-2.

Optimizer: SGD. Learning Rate: 1.0. Decay: 1e-3.

Optimizer: SGD. Learning Rate: 1.0. Decay: 1e-3. Momentum: 0.5.

Optimizer: SGD. Learning Rate: 1.0. Decay: 1e-3. Momentum: 0.9.


AdaGrad:

Optimizer: AdaGrad. Decay: 1e-4


RMSProp:

Optimizer: RMSProp. Decay: 1e-4

Optimizer: RMSProp. Decay: 1e-5. rho: 0.999


Adam:

Optimizer: Adam. Learning Rate: 0.02. Decay: 1e-5

Optimizer: Adam. Learning Rate: 0.05. Decay: 5e-7