#LLM — CyberDevTech

AI Agents

Benchmarking Open Models on Your Own Tool Schemas Before You Commit

Public leaderboards score tool calling on clean synthetic schemas, not the nested mess your MCP server exposes. Here's the ~50-line Python harness that settles the debate on your own stack.

June 19, 2026

AI Agents

The Pre-Ship Security Checklist for Vibe-Coded PRs

An AI agent resolved an RLS error by making the table publicly readable. CI stayed green. Here's the grep triage and checklist I run before any LLM-generated code ships.

May 22, 2026

Workflows

AI Evals Are a Bottleneck: The Minimal Harness I'd Wire Into CI Today

Most LLM features ship on vibes — skim three outputs, merge. Here's the 40-line Python eval harness with JSON fixtures and a judge-model gate you can wire into CI today.

May 06, 2026

AI Agents

Prompt Injection in Tool-Calling Agents: The Surface You are Ignoring

Once your agent calls tools, prompt injection is a live attack path. Here's the exploit in real Python — and the Pydantic fix that belongs in your dispatch layer.

May 05, 2026