One capstone.
Many standalones.
The capstone covers all 48 skills end-to-end. Each standalone repo takes one skill, fills one clear gap in the open-source landscape, and ships on its own.
rondiver/ai-native-engineering
SOONThe capstone. 48 skills, 10 domains. Every skill has a Competent bar, a Weak bar, a deterministic eval.
How it's built →
48 skills, one anatomy, deterministic eval loops, and a manifest that refuses to let undocumented files exist. The structural argument behind the benchmark — for engineers who want to look under the hood before forking.
Evals
TIER 1An opinionated, reproducible eval harness — Competent/Weak bars for 48 AI-engineering skills, scored deterministically.
Red-Teaming
TIER 1Attack kits that actually run — not a checklist of vulnerabilities you could theoretically exploit.
MCP Design
TIER 1How to design MCP tools that LLMs actually call correctly. Naming, descriptions, schema, anti-patterns.
Observability & Tracing
TIER 1What to log when every call is non-deterministic. Structured traces you can actually replay and diff.
RAG
TIER 1Retrieval that works past 10K documents — chunking, hybrid search, rerank, evaluation. Not toy tutorials.
Want to know when the next repo ships?
Subscribe to the newsletter