Skip to content

Blog

Production LLM Systems Tutorial 1: End-to-End Application Design

May 9, 2026

Production LLM Systems Tutorial 2: Latency, Cost, and Quality

May 9, 2026

Production LLM Systems Tutorial 3: Scalable Inference Architecture

May 9, 2026

Production LLM Systems Tutorial 4: RAG and Data Pipelines

May 9, 2026

Production LLM Systems Tutorial 5: Monitoring and Observability

May 9, 2026

Production LLM Systems Tutorial 6: Evaluation and A/B Testing

May 9, 2026

Production LLM Systems Tutorial 7: Security and Prompt Injection

May 9, 2026

Production LLM Systems Tutorial 8: Human-in-the-Loop Workflows

May 9, 2026

Production LLM Systems Tutorial 9: Cost Optimization

May 9, 2026

Production LLM Systems Tutorial 10: Versioning and Disaster Recovery

May 9, 2026

Agents Need Seatbelts: Guardrails and Infinite-Loop Detection for Tool-Using AI

May 6, 2026

Your Token Bill Has a Leak: Cost Monitoring for Hidden LLM Waste

May 6, 2026

Reduce LLM Inference Cost by 60% Without Serving Stale Answers

May 5, 2026

Why Agentic AI Is Bringing CPUs Back Into the Spotlight

May 3, 2026

YC's 2026 Startup Map: AI Has Left the Chatbox

April 29, 2026

Your RAG Demo Passed. Your RAG System Needs a Judge: RAGAS, Humans, and Evidence

April 23, 2026

Agentic AI Needs Smarter Inference: Hints, Priority, and Cache Lifecycle

April 17, 2026

Draft Tokens or Smaller Numbers? Speculative Decoding vs Quantization in Production

April 16, 2026

KV Cache at Fleet Scale: The Memory System Hiding Inside Every LLM Platform

April 9, 2026

The Cache Has Layers: Prompt Caching, Semantic Caching, and When Each One Betrays You

April 2, 2026

Autoscaling LLMs by TTFT and TPOT, Not CPU Utilization

March 27, 2026

From H100 to Blackwell: What Actually Changes for Inference Architects

March 20, 2026

Speculative Decoding in Production: When Draft Tokens Help and When They Hurt

February 27, 2026

From Prefill to Decode: Disaggregated Inference as a Distributed Systems Problem

February 20, 2026

The Rust Case for AI Gateways: Backpressure, Streaming, and Failure Isolation

February 6, 2026

Why Round-Robin Dies in LLM Serving: KV-Aware Routing Explained

January 30, 2026

What AI-Native Talent Looks Like in 2026: A Recruiter's Field Guide

January 23, 2026

TensorRT-LLM vs vLLM vs SGLang: Choosing an Inference Engine for Production

January 16, 2026

Dynamo Is Not an Inference Engine. It Is the Control Plane for Tokens

January 9, 2026

The AI Hiring Playbook for 2026: Skills, Signals, and Fewer Shiny Job Titles

December 19, 2025

Inference Is Not HTTP: The Case for a Purpose-Built Gateway in Rust

December 8, 2025

KV-Aware Routing: How Cache Locality Changes Load Balancing for LLMs

November 21, 2025

Tokenomics for Engineers: Measuring Throughput per Dollar Instead of Tokens per Second

November 7, 2025

Why Agentic Workloads Break Traditional Inference Gateways

October 10, 2025

Rust for Systems Programming: When the Borrow Checker Earns Its Keep

September 15, 2025

Disaggregated Inference on Kubernetes: Routing, Scheduling, and Scaling Beyond One GPU

August 29, 2025

Prefill vs Decode: The Hidden Split That Shapes Every LLM Serving Architecture

August 8, 2025

Inference Is a Memory Problem: KV Cache, HBM, and the Real Cost of Long Context

July 18, 2025

Apache Kafka in Production: Beyond the Quickstart

September 12, 2024

Control Planes for Distributed Systems: A Practitioner's Guide

July 30, 2024

Beyond Goroutines: Production Patterns for Go Concurrency

June 22, 2024

Cloud Agnostic Engineering: The Real Cost of Multi-Cloud Portability

May 15, 2024

Harnessing the Power of Next.js and React: A Comprehensive Guide

April 15, 2024

100 Java Programming and LLD Interview Questions Handbook with Answers

September 25, 2023

A Comprehensive Guide to Workflow as a Service System Design

September 9, 2023

Why You Rarely Need ElasticSearch When You Have PostgreSQL

September 1, 2023

Dragon Fly DB: The Database of Choice for Modern Enterprise Applications πŸ‘¨πŸΌβ€πŸ’»

March 23, 2023

Hexagonal Architecture in GoLang πŸ‘¨πŸΌβ€πŸ’»

March 21, 2023

Mastering feature flagging with ReactJs, Golang, and Github Actions

March 9, 2023

Object-Oriented Programming in Go: Understanding Structs, Methods, and More

February 9, 2023

Multicloud Kubernetes Deployment and Provisioning

February 2, 2023

10 mistakes those should be avoided inΒ GoLang

January 9, 2023

9 Features of Spring Cloud

January 9, 2023

Eliminating Data Loss in AWS Serverless Architectures with the Outbox Pattern

January 9, 2023

Data Engineering Career Path: How to Advance Your Career in the Field

January 1, 2023

The Tech Behind the Metaverse: An Overview for Developers

December 29, 2022

Debugging in Azure Cloud: A Comprehensive Guide πŸ”

December 29, 2022

Functional programming in Go for beginners: A tutorial πŸ‘¨πŸΌβ€πŸ’»

December 27, 2022

Maximizing Performance with Concurrency in Go πŸ‘¨πŸΌβ€πŸ’»

December 27, 2022

The Kubernetes Handbook: A Comprehensive guide of 100 Q&A

December 25, 2022

Understanding a fair comparison between Cassandra MongoDB and MySQL πŸ‘¨πŸΌβ€πŸ’»

December 23, 2022

Going deeper with Go sockets: Advanced concepts and techniques 🫑

December 23, 2022

The Power of WhatsApp Automation: How Go Can Help you automating Whatsapp messaging 🫑

December 23, 2022

Deploying microservices to Azure Kubernetes Service: A tutorial

December 22, 2022

Using GoLang and Elasticsearch to Crawl and Analyze Amazon.com Data πŸ‘¨πŸΌβ€πŸ’»

December 21, 2022

Building a scalable and reliable e-commerce platform with .NET and Azure 🫑

December 21, 2022

GoLang Frameworks: A Detailed Walkthrough of the Most Popular Options 🫡

December 18, 2022

From Black Box to Glass Box: Using Explainable AI and Model Monitoring to Improve AI Performance and Transparency πŸ’ͺ🏻

December 16, 2022

Best practices for securing your Azure Kubernetes Service deployment

December 15, 2022

MLOPS 101: An Introduction to Managing Machine Learning in Production

December 12, 2022

A Beginner's Guide to Machine Learning Algorithms

December 10, 2022

Amabassador Pattern

December 9, 2022

A Beginner's Guide to Multi-Cloud Networking: Key Concepts and Best Practices

December 7, 2022

Maximizing Cloud-Native Success with the Twelve-Factor App Methodology 🫑

December 3, 2022

JWT Authentication using GoLang

December 2, 2022

Azure Databricks for Enthusiasts: A Beginner's Guide

December 1, 2022

Maximize your productivity as an SRE with these 100 Linux commands πŸ’ͺ

December 25, 2021

Cloud Native vs Cloud Agnostic: Weighing the Trade-Offs 🀜 πŸ€›

December 19, 2021

Which Azure Messaging Service is Right for Your Use Case? A Comparison and Code Walkthrough 🀜 πŸ€›

December 5, 2021

Which is better Azure App Gateway or Nginx configured on Azure VMs 🀜 πŸ€›

December 5, 2021

Why Your Organization Needs a Service Mesh: The Benefits of Service Mesh Architecture and Istio

December 2, 2020