Model Architectures/transformer

Transformer

The neural network architecture that powers almost every modern AI language model.

What it actually means

A transformer is a type of neural network architecture introduced in 2017 that processes all words in a sequence simultaneously rather than one at a time. It uses attention mechanisms to weigh the importance of each word relative to every other word, making it exceptionally good at understanding context and meaning in language.

Real-world analogy

Older models read sentences like a person reading a book — one word at a time, left to right, trying to remember what came before. A transformer is more like a team of editors reviewing the entire manuscript at once, each one focused on different relationships between words — who does what to whom, which adjective belongs to which noun, what "it" refers to three sentences back.

Common misconception

Transformer doesn't refer to the full AI model — it's the underlying architecture. GPT, Claude, Gemini, and LLaMA are all transformer-based models. Saying "a transformer" is like saying "a combustion engine" — it describes the mechanism, not the car.

Related terms

attention-head llm embeddings