Tho Le

A Data Scientist. Looking for knowledge!

NLP - LLMs - Basics

13 Feb 2025 » ai, llms, nlp, nn

Reasoning LLMs

LLMs Explaining

Transformers Explaining

LSTM Explaining

  • Specifically designed to avoid the vanishing and exploding gradients problem.
    • Thanks to a gated structure and the “constant error carousel” (where the backpropagation of the errors can decay slowly over many steps).
  • The gated structure is to control information flow. Gates: forget, input, output.
  • It contrasts long-term memory (cell state c) vs. short-term memory (hidden state h).
  • It mitigates the vanishing gradients problem through the “constant error carousel”.
  • Understand LSTM before learning about Transformers.
  • Limit: due to seq2seq, encoder are separated from decoder –> can not scale.
  • What makes Transformers better: they overcame the sequential processing limitations of RNNs/LSTMs. This enables MUCH faster training through GPU parallelization.
  • LSTM is also the bridge between NLP and forecasting.

LSTM Resources