Senthilkumar Gopal | Musings of a machine learning researcher, engineer and leader

Sat 29 March 2025
Security

Hashcash Revisited - A Computational Barrier Against AI Web Crawlers

This post analyzes Hashcash as a defense against AI web crawlers, detailing its probabilistic proof-of-work model and operational viability. It includes a technical breakdown of token generation, expected runtime guarantees, and implementation strategy.

Sat 28 December 2024
Large Language Models

Paged Attention and Chunked Prefill for LLM Inference

This post explains how Paged Attention and Chunked Prefill optimize memory and computation in vLLM by organizing key-value caches into dynamic blocks and processing input sequences in manageable chunks. It includes a simple walkthrough with tensor shapes and code to showcase their integration for LLM inference.

Mon 02 September 2024
Neuron

Slurm Cluster usage tips for quick debug and testing

This post covers practical Slurm commands and best practices to help you reserve, access, and work cleanly on nodes.

Sat 18 May 2024
Large Language Models

LLM Inference Systems

A quick clarification between the terms - Triton, TensorRT, and TensorRT-LLM

Sat 20 January 2024
Neuron

Neuron - Handling NaNs

A runbook for triaging Neuron accuracy issues and means to verify the model accuracy

Fri 29 December 2023
Neuron

EFA and OpenFabrics

This post works through a high level overview of OpenMPI and EFA

Fri 22 December 2023
Neuron

AI Compilers - A Study guide

A growing list/study guide of AI compilers, from foundational concepts like graph lowering and systolic arrays to practical tools like TorchDynamo and Glow.

Wed 20 December 2023
Neuron

Aliasing on XLA

This post explores the concept of aliasing in XLA, its significance, the mechanisms through which it is implemented, and future directions for extending aliasing optimizations.

Wed 20 December 2023
Neuron

All Reduce Decomposition

This post works through the metrics that are critical for ML inference

Wed 20 December 2023
Neuron

Concept - Coalescing

This post works through the metrics that are critical for ML inference

1 of 4
Next