Provided by

arXiv

arxiv.org
0.0/5
6 views
0 saved
FreeadvancedText Generation
ResearchersAI EngineersStudentsData Scientists
Introduces Long Short-Term Memory (LSTM), a recurrent neural network architecture designed to solve the vanishing gradient problem.

Overview

The Long Short-Term Memory (LSTM) paper by Hochreiter and Schmidhuber introduced a groundbreaking recurrent neural network (RNN) architecture that effectively addresses the vanishing and exploding gradient problems inherent in traditional RNNs. LSTMs employ a sophisticated gating mechanism—input, forget, and output gates—to control the flow of information through memory cells, enabling them to learn long-term dependencies in sequential data. This innovation was pivotal for advancements in speech recognition, machine translation, and time-series prediction, laying foundational groundwork for modern sequence modeling and deep learning applications.

Abstract

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review related prior work, then introduce a new, efficient, gradient-based method called long short-term memory (LSTM) that can learn to bridge time intervals in excess of 1000 discrete time steps even in the presence of noisy, continuously valued input streams. Unlike previous RNNs, LSTM does not use an activation function in its recurrent cells. Weights and biases for the gates are learned using a variant of gradient descent. No teacher forcing is required. LSTM is an efficient and robust alternative to existing RNN architectures.