We present a way to set the step size of Stochastic Gradient Descent, as the solution of
a distance minimization problem. The obtained result has an intuitive interpretation and
resembles the update rules of well known optimization algorithms. Also, asymptotic results
to its relation to the optimal learning rate of Gradient Descent are discussed.
In addition, we talk about two different estimators, with applications in
Variational inference problems, and present approximate results about their variance.
Finally, we combine all of the above, to present an optimization algorithm that can be used
on both mini-batch optimization and Variational problems.