Attention Mechanism, Core of Transformer Models
In this blog post, I will focus on the core principles of transformer models, specifically the self-attention mechanism. To keep the discussion straightforward, I will approach the concepts from the perspective of decoder-only models like GPT.