Workflows
AI Evals Are a Bottleneck: The Minimal Harness I'd Wire Into CI Today
Most LLM features ship on vibes — skim three outputs, merge. Here's the 40-line Python eval harness with JSON fixtures and a judge-model gate you can wire into CI today.
May 06, 2026