I started this thought in this post over here.
Overview
High-accuracy agentic systems are not built by exposing APIs as tools. They are engineered by starting from the job to be done (JTBD), designing a minimal set of task-shaped tools, and using verification to drive iterative improvement.
This process turns probabilistic execution into a system that measurably improves over time.
1) Define the Job to Be Done
A JTBD must specify not just what to do, but what “done” means.
- The job should imply clear success criteria.
- Completion must be verifiable, not assumed.
If “done” cannot be checked, the job is underspecified.
2) Decompose the Job into Verifiable Sub-Jobs
Break the JTBD into a small number of meaningful sub-jobs, each with:
- a purpose
- an expected outcome
- a way to verify success
This decomposition defines the logical shape of the system.
3) Design Tools Around Sub-Jobs, Not APIs
Tools should correspond to units of work, not raw endpoints.
Good tools:
- bundle steps that always occur together
- encode best practices and constraints
- reduce the surface area for error
The goal is a minimal tool set that fully completes the job.
4) Engineer Planning and Orchestration
Use a planning prompt that:
- understands the job
- selects an appropriate strategy
- sequences tools deliberately
- respects constraints and safety rules
Planning quality often matters more than model capability.
5) Execute with Guardrails
Tool execution should be:
- constrained
- reviewable when necessary
- safe by default
Execution alone is not success; it is only an attempt.
6) Verify Outcomes Against the Job
Verification is mandatory and explicit.
For each sub-job, verify:
- correctness
- completeness
- compatibility with downstream use
Verification turns execution into accountable work and produces signals the system can learn from.
7) Use Verification Signals to Iterate
Verification results enable systematic improvement across:
- planning prompts
- tool definitions
- tool instructions
- orchestration order
- tool count (merge or remove tools)
Iteration is driven by evidence, not intuition.
8) Fine-Tune Only After the System Is Stable
Fine-tuning is appropriate when:
- the structure is sound
- errors are repeatable
- verification data is reliable
Fine-tuning reduces residual error; it does not fix poor system design.
Core Principle
Execution completes the job once.
Verification enables the system to do it better next time.
That feedback loop — from JTBD to tools to execution to verification and back — is what differentiates a robust agentic system from a brittle one.
Applied directly to NeonJS
1) Start with the Job to Be Done (JTBD)
Always begin by stating the outcome in plain language, focusing on what success must prove:
“Perform a safe, verifiable schema migration.” :contentReference[oaicite:0]{index=0}
This shapes all subsequent design decisions.
2) Collapse Low-Level APIs Into Task-Oriented Tools
Don’t give an LLM a catalog of generic endpoints. Instead, expose a small set of clearly named, job-centric tools.
Neon’s concrete example for schema migration uses just four tools:
| Tool | Purpose |
|---|---|
prepare_database_migration |
Stage schema change safely in a temp branch |
run_sql / run_sql_transaction |
Verify work on temp branch |
describe_table_schema / get_database_tables |
Inspect structure for correctness |
complete_database_migration |
Commit and clean up when checks pass |
This small surface (instead of 100+ generic API calls) dramatically increases accuracy by reducing choice overload. :contentReference[oaicite:2]{index=2}
Design principles for tools:
- Verb-first, goal-focused names (e.g.,
prepare_…, notPOST /v1/db/…). :contentReference[oaicite:3]{index=3} - Encoded guardrails (temporary branches, sandbox staging). :contentReference[oaicite:4]{index=4}
- Clear, outcome-oriented descriptions that implicitly teach workflow order. :contentReference[oaicite:5]{index=5}
3) Build Planning/Orchestration That Reflects the Job
A good planning prompt should map high-level intent to tool sequences with minimal ambiguity.
By keeping the toolkit small and intentional:
- the agent’s decision space shrinks,
- the correct sequence emerges naturally,
- “tool roulette” disappears. :contentReference[oaicite:6]{index=6}
This is the essence of Your API is not an MCP — building an MCP server that shapes the assistant’s interaction to the job, not the API surface. :contentReference[oaicite:7]{index=7}
4) Execute with Explicit Safety and Stage Gates
Design tools so that harmful actions cannot occur before checks are complete. For Neon’s migration flow, that means:
- stage changes in a throw-away environment,
- run verification queries,
- only complete the full migration after checks pass. :contentReference[oaicite:8]{index=8}
These built-in safety guardrails reduce silent failures and avoid catastrophic states.
5) Verification Is the Essential Feedback Loop
Verification is why iterative improvement is possible:
- it yields concrete error signals,
- it converts execution outcomes into evidence,
- it gives you something to optimize against.
For schema migration, verification looks like:
- running validation SQL (
run_sql), - inspecting schema (
describe_table_schema), - checking counts/constraints before commit. :contentReference[oaicite:9]{index=9}
This makes success observable and failures inspectable, which in turn means:
- planning prompt tweaks can be measured,
- tool definitions can be refined,
- orchestration orders can be compared,
- training data for fine-tuning can be generated.
Without verification, you have no learning signal.
6) Iterative Refinement Driven by Verification
Explicit verification generates the metrics that make iteration possible.
Use those signals to:
- merge tools used together frequently,
- remove rarely needed tools,
- clarify documentation so the agent learns faster,
- refine planning prompts to avoid predictable failure points.
Iterations shrink the decision space and improve accuracy by progressively removing ambiguity.
7) Fine-Tune Only After Structure Stabilizes
After a reliable baseline (small tool set + consistent verification) exists, fine-tuning models makes sense:
- specialize models for tool selection accuracy,
- reduce hallucination around tool usage,
- handle domain-specific language patterns,
- shorten planning loops and improve decisiveness.
But fine-tuning alone can’t fix poor tooling or underspecified JTBDs — the verification-driven structure does.
Core Insight
Execution completes the job once.
Verification connects one run to the next.
A small, task-focused toolset makes those links visible.
This is exactly the pattern Neon ships in their MCP workflow — and why your article emphasized:
- small, well-named tools,
- encoded guardrails,
- verifiable transitions,
- and job-level sequencing taught via documentation and labels. :contentReference[oaicite:10]{index=10}
Quick Practical Checklist
- State the JTBD with verifiable success criteria.
- Define ≤10 task-oriented tools that cover the job end-to-end.
- Label tools with verb-first, outcome-focused names.
- Build planning prompts that naturally select and sequence tools.
- Execute with built-in safeties (sandboxes, checks).
- Verify outcomes explicitly and early.
- Use verification signals to refine tools, prompts, and orchestration.
- Fine-tune only once the core system is stable.