Tag

#LLM

Workflows

AI Evals Are a Bottleneck: The Minimal Harness I'd Wire Into CI Today

Most LLM features ship on vibes — skim three outputs, merge. Here's the 40-line Python eval harness with JSON fixtures and a judge-model gate you can wire into CI today.

May 06, 2026
AI Agents

Prompt Injection in Tool-Calling Agents: The Surface You are Ignoring

Once your agent calls tools, prompt injection is a live attack path. Here's the exploit in real Python — and the Pydantic fix that belongs in your dispatch layer.

May 05, 2026