Recent Articles
21/21 - Below PyTorch: Profiling, Compilation, and CUDA Kernel Optimization
A production-focused guide to below pytorch: profiling, compilation, and cuda kernel optimization, with architecture, capacity math, failure analysis, and operational controls.
20/21 - The Ground Beneath AI: Linux, Networking, and Storage
A production-focused guide to the ground beneath ai: linux, networking, and storage, with architecture, capacity math, failure analysis, and operational controls.
19/21 - Shipping Models Like Software: CI/CD, MLflow, and Registries
A production-focused guide to shipping models like software: ci/cd, mlflow, and registries, with architecture, capacity math, failure analysis, and operational controls.
18/21 - Assume the Prompt Is Hostile: Security and Guardrails
A production-focused guide to assume the prompt is hostile: security and guardrails, with architecture, capacity math, failure analysis, and operational controls.
17/21 - From Kafka to Tokens: Streaming Data and Online Inference
A production-focused guide to from kafka to tokens: streaming data and online inference, with architecture, capacity math, failure analysis, and operational controls.
16/21 - Agents Need Infrastructure Too: MCP and Workflow Orchestration
A production-focused guide to agents need infrastructure too: mcp and workflow orchestration, with architecture, capacity math, failure analysis, and operational controls.
15/21 - The Router Is Part of the Model: Routing, Hedging, and Fallback
A production-focused guide to the router is part of the model: routing, hedging, and fallback, with architecture, capacity math, failure analysis, and operational controls.
14/21 - Benchmarking Without Lying: Evals, Load Tests, and A/B Experiments
A production-focused guide to benchmarking without lying: evals, load tests, and a/b experiments, with architecture, capacity math, failure analysis, and operational controls.
13/21 - Can You Debug a Token? Observability for AI Systems
A production-focused guide to can you debug a token? observability for ai systems, with architecture, capacity math, failure analysis, and operational controls.
12/21 - Cache the Right Thing: Prompt, Semantic, and Cost-Aware Reuse
A production-focused guide to cache the right thing: prompt, semantic, and cost-aware reuse, with architecture, capacity math, failure analysis, and operational controls.