Senthilkumar Gopal

Musings of a machine learning researcher, engineer and leader


Paged Attention and Chunked Prefill for LLM Inference

This post explains how Paged Attention and Chunked Prefill optimize memory and computation in vLLM by organizing key-value caches into dynamic blocks and processing input sequences in manageable chunks. It includes a simple walkthrough with tensor shapes and code to showcase their integration for LLM inference.



  • Thu 09 May 2024
  • Math

Common Terms used in mathematical proofs

This post explains common terms in mathematical writing—like theorem, lemma, axiom, and proof—using simple examples involving even numbers. It helps beginners understand how these terms structure mathematical reasoning.



AI Compilers - A Study guide

A growing list/study guide of AI compilers, from foundational concepts like graph lowering and systolic arrays to practical tools like TorchDynamo and Glow.

Aliasing on XLA

This post explores the concept of aliasing in XLA, its significance, the mechanisms through which it is implemented, and future directions for extending aliasing optimizations.