Topic: Inference

All essays filed under "Inference".

What LLMs Do at Inference: A Deep Dive Under the Hood

Updated: 9 Jan, 2026

A step-by-step, reference-backed explanation of what happens during LLM inference: tokenization, embeddings, prefill & decode phases, KV caching, decoding strategies, bottlenecks and optimizations like quantization, FlashAttention and speculative decoding.
KV Cache Explained - A Deep Dive into Transformer Optimization

Updated: 9 Jan, 2026

A Deep Dive into Transformer Optimization
Top-k vs. Nucleus Sampling - Decoding the Secrets of AI Text Generation

Updated: 9 Jan, 2026

Decoding the Secrets of AI Text Generation
GPU vs TPU - Decoding the Battle of AI Accelerators in 2025

Updated: 9 Jan, 2026

Decoding the Battle of AI Accelerators in 2025

What LLMs Do at Inference: A Deep Dive Under the Hood