Skip to content

Recent Articles

80 posts · sorted by date
July 21, 2026 10 min · read

21/21 - Below PyTorch: Profiling, Compilation, and CUDA Kernel Optimization

A production-focused guide to below pytorch: profiling, compilation, and cuda kernel optimization, with architecture, capacity math, failure analysis, and operational controls.

July 20, 2026 10 min · read

20/21 - The Ground Beneath AI: Linux, Networking, and Storage

A production-focused guide to the ground beneath ai: linux, networking, and storage, with architecture, capacity math, failure analysis, and operational controls.

July 19, 2026 10 min · read

19/21 - Shipping Models Like Software: CI/CD, MLflow, and Registries

A production-focused guide to shipping models like software: ci/cd, mlflow, and registries, with architecture, capacity math, failure analysis, and operational controls.

July 18, 2026 10 min · read

18/21 - Assume the Prompt Is Hostile: Security and Guardrails

A production-focused guide to assume the prompt is hostile: security and guardrails, with architecture, capacity math, failure analysis, and operational controls.

July 17, 2026 10 min · read

17/21 - From Kafka to Tokens: Streaming Data and Online Inference

A production-focused guide to from kafka to tokens: streaming data and online inference, with architecture, capacity math, failure analysis, and operational controls.

July 16, 2026 10 min · read

16/21 - Agents Need Infrastructure Too: MCP and Workflow Orchestration

A production-focused guide to agents need infrastructure too: mcp and workflow orchestration, with architecture, capacity math, failure analysis, and operational controls.

July 15, 2026 10 min · read

15/21 - The Router Is Part of the Model: Routing, Hedging, and Fallback

A production-focused guide to the router is part of the model: routing, hedging, and fallback, with architecture, capacity math, failure analysis, and operational controls.

July 14, 2026 10 min · read

14/21 - Benchmarking Without Lying: Evals, Load Tests, and A/B Experiments

A production-focused guide to benchmarking without lying: evals, load tests, and a/b experiments, with architecture, capacity math, failure analysis, and operational controls.

July 13, 2026 10 min · read

13/21 - Can You Debug a Token? Observability for AI Systems

A production-focused guide to can you debug a token? observability for ai systems, with architecture, capacity math, failure analysis, and operational controls.

July 12, 2026 10 min · read

12/21 - Cache the Right Thing: Prompt, Semantic, and Cost-Aware Reuse

A production-focused guide to cache the right thing: prompt, semantic, and cost-aware reuse, with architecture, capacity math, failure analysis, and operational controls.