Chaining Agents Needs a Spine

A useful chain begins with a practical question: what should the next agent know that the previous one did not settle. If the answer is nothing, you do not need another agent. You need a better first pass. A planner is useful when it can turn a fuzzy request into a specific target, name the file or artifact that must change, and define what success looks like. A builder is useful when it can act against that narrowed target. A verifier is useful when it can compare the result against the original promise and decide whether the work is actually done. If those conditions are absent, chaining agents is usually worse than giving one competent agent a sharper brief.

The easiest way to see this is in ordinary coding work. Suppose you want an agent chain to fix a flaky test. The first agent should not write a motivational essay about reliability. It should identify which test is failing, what dependency looks unstable, and what evidence would prove the fix worked. The second agent should make the change and run the test. The third agent should check whether the flake is actually gone, whether the patch created a new failure, and whether the result satisfies the original goal. That is a chain with compression. Each handoff reduces the size of the unknown.

Now compare that with the bad version. The first agent says the flake may involve timing, state leakage, or environment differences. The second proposes three possible fixes but does not commit to one. The third says more testing is needed. Nothing here is false, but nothing is finished either. You have not built a chain. You have built a permission structure for indecision.

That is the first practical rule: never hand off a larger problem than the one you received. If an agent produces more options, more ambiguity, or more unanswered questions than it inherited, the chain is degrading the task. The second rule is just as important: every chain needs a single completion condition that survives all handoffs. If the planner thinks success means a clean patch, the builder thinks success means a passing test, and the reviewer thinks success means a plausible explanation, then the chain has no spine. It has opinions.

That is where the Ralph loop earns its place. Most real work does not yield on the first pass, because the first pass exposes a hidden dependency, the second clears a false assumption, and only later does the artifact finally match the promise. The loop keeps the promise fixed and sends the work back through the chain until reality agrees. This is what turns agent chaining into something useful in practice. Without the loop, the chain can stop anywhere. With the loop, every stage remains answerable to the same finish line.

This gives you a clear way to use chaining without getting lost in it. Start with one agent unless the task naturally splits into distinct forms of work. Add a planner only when scoping is the hard part. Add a verifier only when false completion is a real risk. Give each stage a deliverable that can be inspected in one sentence. Make the verifier capable of saying no. Then wrap the whole thing in a loop that sends failed work back for revision instead of treating critique as the end of the process.

In practice, the best small chain is usually three stages. First, sharpen the task until the target is explicit. Second, change the artifact. Third, verify against a fixed completion condition. If verification fails, the work goes back around with the failure attached, not as vague criticism but as a concrete defect to remove. That pattern is simple enough to run daily, and strict enough to prevent the common failure mode where every agent sounds smart while the work itself stays unfinished.

The mistake is to think chaining agents is about adding intelligence through sheer quantity. It is really about structuring reduction. Each stage should leave less room for error than the one before it, and the loop should keep applying pressure until the artifact meets the standard you named at the beginning. That is actionable, because you can test it immediately on real work. Pick one task, define one completion condition, give each agent a narrower job than the last, and see whether the final verifier can kill the task or close it.

That is the whole lesson. Chain agents only when the handoffs compress the problem, and use the Ralph loop to keep the chain honest until the job is actually done.

Send Zap