📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google reveals that AI models constitute only about 10% of the system’s behavior. The focus shifts to harnessing and verifying AI outputs, which are now the core skills for effective AI-driven development.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the AI model accounts for only about 10% of a system’s behavior. Instead, the harness and verification processes determine the system’s effectiveness, shifting the focus from model development to configuration, testing, and context management. This insight challenges common assumptions in AI development and underscores where teams should invest effort.

The whitepaper, titled The New SDLC With Vibe Coding, highlights that the most significant shift in software engineering is moving from writing code to expressing intent and trusting machines to execute it. As of early 2026, approximately 85% of professional developers use AI coding agents regularly, with 51% doing so daily, and roughly 41% of new code generated by AI.

Crucially, the authors argue that the core of effective AI systems lies not in the model itself but in the harness—the prompts, tools, rules, and observability layers that surround it. Evidence from benchmarks shows that changing only the harness, with the same model, can significantly improve performance, sometimes by over 13 points on evaluation scores. This demonstrates that configuration and scaffolding are where most failures and improvements originate.

The paper also emphasizes that cost efficiency in AI development favors investing in harness and context engineering over chasing the latest model upgrades. While vibe coding appears inexpensive initially, it incurs high operational costs through token wastage, maintenance, and security vulnerabilities, making disciplined, structured approaches more economical in the long run.

At a glance
reportWhen: published early 2026
The developmentGoogle’s new whitepaper on SDLC emphasizes that the AI model itself is only 10% of the system, with harness and verification responsible for the majority of behavior.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Why Focus on Harness and Verification Matters

This shift in understanding has profound implications for AI teams and organizations. By recognizing that 90% of behavior depends on configuration, companies can allocate resources more effectively, improving system robustness and reducing costs. The insight encourages a move away from model-centric strategies towards building durable, configurable systems that leverage the full potential of AI while managing risks and costs.

For practitioners, mastering harness design and verification processes becomes essential, transforming the skill set needed in AI development from model tuning to system architecture and context engineering. This approach promises more predictable, secure, and cost-effective AI deployment.

AI Model Risk Blueprint: Model Validation Testing | Ethical Considerations in AI Models | Integrating AI with Business Risk Plans | Real-World AI Model ... Strategies | AI Governance Tools & Resource

AI Model Risk Blueprint: Model Validation Testing | Ethical Considerations in AI Models | Integrating AI with Business Risk Plans | Real-World AI Model … Strategies | AI Governance Tools & Resource

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on the Evolution of AI Development Practices

Historically, AI development focused heavily on training and improving models, with the assumption that the model’s quality dictated system performance. However, recent trends show that the rapid proliferation of AI tools has shifted the emphasis towards configuration, prompt engineering, and system integration. The whitepaper builds on this evolution, emphasizing that the real challenge is managing the entire system environment around the model.

Earlier in 2025, industry discussions centered on vibe coding—quick, less structured prompts—highlighting its limitations. Now, the focus is on moving towards agentic engineering, where AI operates within a framework of rules, tests, and guardrails, making the system more reliable and manageable.

This development aligns with reports that AI-generated code now constitutes 41% of new software, underscoring the need for disciplined, scalable approaches rather than ad hoc prompt tuning.

“The biggest shift in software engineering isn’t a new language or framework; it’s moving from writing code to expressing intent and trusting machines to execute it.”

— Addy Osmani

Amazon

AI harness configuration software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unresolved Questions About Implementation and Adoption

While the whitepaper emphasizes the importance of harness and verification, it remains unclear how quickly organizations will adopt these principles at scale. Specific strategies for transitioning from vibe coding to agentic engineering are still being developed, and the relative costs of re-engineering existing systems versus building new ones are not yet fully understood. Additionally, the impact on smaller teams or those with limited resources is still being evaluated.

Amazon

AI observability and monitoring tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Development and Industry Adoption

Organizations are likely to begin prioritizing system configuration, testing, and context engineering as core skills. Expect further research and case studies demonstrating how harness design improves performance and reduces costs. Industry leaders may also develop tools and frameworks to facilitate this shift, making disciplined system engineering more accessible. Monitoring how these practices influence AI reliability and security will be key in the coming months.

Amazon

AI testing and verification platforms

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

The whitepaper states that the majority of an AI system’s behavior depends on how the model is integrated, configured, and verified through prompts, tools, and guardrails, not just the model itself.

What is harness in AI systems?

Harness refers to the surrounding infrastructure—prompts, rules, tools, observability, and configuration—that guides and controls the AI model’s behavior.

How does this shift affect AI development costs?

Focusing on harness and verification can lower long-term costs by reducing token wastage, improving security, and increasing system reliability, despite higher upfront investment in system design.

What skills should AI teams prioritize now?

Teams should focus on system architecture, context engineering, verification, and configuration management rather than solely model tuning or prompt engineering.

Will this change how AI products are built?

Yes, organizations will likely adopt more disciplined, modular approaches, emphasizing system robustness and cost efficiency over chasing the latest model upgrades.

Source: ThorstenMeyerAI.com

You May Also Like

NUMA Awareness: Why Memory Placement Matters on Big Machines

Inefficient memory placement on large machines can cause bottlenecks, but understanding NUMA awareness reveals how proper placement unlocks optimal performance.

Differential Privacy: Building Privacy-Preserving Applications

Optimizing data privacy with differential privacy techniques unlocks secure applications, but mastering the balance between utility and privacy remains essential to effectively protect user information.

Understanding Transformer Models for Code Generation

Harnessing transformer models for code generation unlocks powerful insights into complex dependencies that you won’t want to miss.

Search as Code: Perplexity Is Right About the Future — Just Not First to It

Perplexity introduces Search as Code, enabling AI agents to dynamically assemble retrieval pipelines, promising higher accuracy and efficiency.