Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review related prior work, then introduce a new, efficient, gradient-based method called long short-term memory (LSTM) that can learn to bridge time intervals in excess of 1000 discrete time steps even in the presence of noisy, continuously valued input streams. Unlike previous RNNs, LSTM does not use an activation function in its recurrent cells. Weights and biases for the gates are learned using a variant of gradient descent. No teacher forcing is required. LSTM is an efficient and robust alternative to existing RNN architectures.
{
"id": "4c1b3dc6-7df1-4a4e-b4e0-65c06384d3c1",
"title": "Long Short-Term Memory (1997)",
"slug": "long-short-term-memory",
"video_url": "https://www.youtube.com/watch?v=DO91xA_QD6Y",
"url": "https://deeplearning.cs.cmu.edu/S23/document/readings/LSTM.pdf",
"resource_category": "research",
"image_url": null,
"thumbnail_url": null
}