We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyperparameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms are discussed. We present empirical results for a range of convex optimization problems and for training deep neural networks.
{
"id": "1e8cdd9d-1eda-4f1d-b44a-9e828d7d95bc",
"title": "Adam: A Method for Stochastic Optimization (2014)",
"slug": "adam-a-method-for-stochastic-optimization",
"video_url": "https://www.youtube.com/watch?v=CcV0EOfmKeU",
"url": "https://arxiv.org/abs/1412.6980",
"resource_category": "research",
"image_url": null,
"thumbnail_url": null
}