Agents Need Seatbelts: Guardrails and Infinite-Loop Detection for Tool-Using AI

#ai #agents #guardrails #llm #tool-use #security #observability

An agent without guardrails is just a while loop with a credit card.

Tool-using agents are powerful because they can observe, plan, call tools, inspect results, and try again. That same loop is the source of the most boring and expensive failure mode in agent systems:

think -> search -> read -> think -> search -> read -> think -> search

No explosion. No dramatic exception. Just a polite machine spending money while making no progress.

Guardrails are not a single classifier. They are a control system around the loop.

The loop is the product

A production agent loop has state:

user goal
plan
messages
tool calls
observations
memory
budget
permissions
progress signal
termination criteria

If the system cannot explain why the next step is allowed, it should not take the step.

The loop is where safety and cost controls have to live. A one-time input filter is not enough.

Guardrail layers

A useful agent stack has several guardrails:

Layer	Example control
Input	prompt injection screening, sensitive data detection
Planning	task scope, allowed tools, required approval
Tool call	authorization, schema validation, rate limits
Observation	sanitize tool output, detect malicious instructions
Memory	do not store secrets, tenant isolation
Output	policy, citations, PII checks
Loop	budgets, progress checks, recursion limits

OpenAI’s Agents SDK has guardrail concepts around validating inputs and outputs. NVIDIA NeMo Guardrails uses rails to define conversational and action constraints. LangGraph exposes recursion limits to stop runaway graph execution. AutoGen and CrewAI expose iteration or auto-reply limits. OWASP’s LLM Top 10 calls out prompt injection, sensitive information disclosure, and excessive agency as major application risks.

Different frameworks, same lesson: control the loop.

Infinite loops are usually “no progress” loops

Most bad loops are not exact repeats. They are semantically repetitive:

search "pricing policy"
read same docs
search "pricing policy enterprise"
read same docs
summarize
decide context missing
search "pricing policy"

Detecting this needs more than a counter.

Signals:

repeated tool name with similar arguments
same URLs or documents observed repeatedly
no new facts added to state
plan text repeating
answer confidence not improving
token spend increasing while task state is unchanged
same error returned by tool multiple times

Create a state fingerprint:

fingerprint =
  hash(goal
       + normalized_plan
       + last_tool_name
       + normalized_tool_args
       + retrieved_doc_ids
       + known_facts)

If the fingerprint repeats, or changes without adding facts, slow down or stop.

Budgets are guardrails

Every agent run should have budgets:

max steps
max tool calls
max input tokens
max output tokens
max total tokens
max wall-clock time
max retries per tool
max repeated action fingerprints
max spend

Budgets should be task-aware. A background research agent may get 50 steps. A customer-support answer may get 4. A payment-changing workflow may require explicit approval before a tool call.

Stopping is a feature. The agent should know when to ask for help instead of improvising forever.

Tool authorization

Every tool should declare:

required permissions
allowed input schema
side effects
rate limit
cost estimate
approval requirement
data classification
audit fields

Dangerous tools need stronger controls:

payment
deletion
deployment
email sending
external API writes
database mutation
browser automation

The model should not decide alone whether it is allowed to send money, delete data, or email a customer. That decision belongs to policy code.

Production controls

Build the runtime with:

step-by-step tracing
state snapshots
token and cost budgets
tool-call audit log
per-tool timeout
idempotency keys
cancellation propagation
human approval gates
emergency kill switch
replay harness for failed runs

If you cannot replay an agent failure, you cannot debug it. If you cannot bound an agent run, you cannot price it.

The answer is not “more guardrails”

Too many guardrails can make agents useless. The goal is not to block everything. The goal is to make the agent’s freedom explicit:

what can it do?
with whose data?
for how long?
at what cost?
with what approval?
when should it stop?

That is the contract. Once the contract is explicit, the agent becomes a system rather than a surprise.

Sources worth reading

OpenAI Agents SDK guardrails.
NVIDIA NeMo Guardrails for conversational and action guardrails.
LangGraph recursion limit docs for graph loop protection.
AutoGen agent chat docs for agent interaction controls.
CrewAI max iterations docs for execution limits.
OWASP Top 10 for LLM Applications for prompt injection, sensitive data, and excessive agency risks.

Your Token Bill Has a Leak: Cost Monitoring for Hidden LLM Waste