Skip to content

Floor vs Ceiling: Different Models for Different Jobs

A neoclassical oil painting reimagined for a far-future setting: in the upper portion, a single robed figure sits in contemplation within a temple made of sleek chrome and holographic marble columns, bathed in golden light emanating from floating data streams. In the lower portion, dozens of identical android workers in classical tunics operate at glowing forges and holographic anvils in synchronized motion. Renaissance composition and chiaroscuro lighting, but with circuit patterns subtly woven into togas, floating geometric interfaces, and bioluminescent accents. Rich earth tones mixed with cyan and gold technological highlights.

I talk a lot about the floor versus the ceiling when it comes to LLMs and agents. The ceiling is the maximum capability when you push these models to the edge of what they can do: complex architectures, novel scientific problems, anything that requires real reasoning. The floor is the everyday stuff, the entry-level human tasks that just need to get done reliably.

For customer service, you want floor models. Cheap, fast, stable. For cutting-edge research or gnarly architectural decisions, you want ceiling models. Expensive, slow, but actually smart.

What I've realized lately is that coding agent workflows should be using both. And most of them aren't.

The TDD Sweet Spot

For me, the best approach with current agents has been strong test-driven development. I architect the ticket, I design the tests, and then I let the agent implement the code. Each of these steps has different requirements, and they probably shouldn't all be using the same model.

Architecting the ticket: This is ceiling territory. You want a model that can think hard about the problem space, understand the existing codebase, and put together a coherent plan. Something like Codex that can reason through tradeoffs and edge cases.

Writing tests: Also ceiling territory. Tests define the acceptance criteria. They're the contract. If the tests are wrong or incomplete, everything downstream is garbage. You want a smart model here too.

Implementing the code: This is where it flips. Once you have a solid plan and good tests, implementation becomes a floor task. You're not asking the model to invent anything novel. You're asking it to write code that passes the tests. Junior to mid-level execution. The requirements are: don't make dumb mistakes, don't add slop to the codebase.

At this point, you could use Haiku. You could use Composer-1. You could use whatever is fastest and cheapest. The hard thinking already happened. Now you just need reliable execution.

Why This Matters

![A neoclassical oil painting reimagined for a far-future setting: at the top, a singular wise figure wearing a laurel crown made of glowing circuitry pours luminous data from a translucent golden vessel into an ornate fountain that blends marble with chrome and holographic elements. Below, the fountain overflows with light and feeds a procession of identical android workers in classical tunics marching outward, each carrying futuristic tools. Renaissance composition with dramatic lighting, rich burgundy and gold tones mixed with cyan technological accents, columns of polished metal and projected light in background.](../../img/floor-ceiling-funnel.png){ width="450" }

If you can get this workflow automated, you can actually churn through tickets. Get the ticket, plan it out sharply with a smart model, agree on acceptance criteria with well-written tests, then hand it off to a fast model that just executes.

The problem is that most coding agents treat everything the same. Planning mode exists in Cursor and Claude Code, but it's kind of a second-class citizen. A nice-to-have. In reality, if you're working with agents, spec-driven and test-driven development isn't optional. It's the only development you should be doing.

What I'd Build

If I could design a coding agent from scratch, here's what I'd do:

1. Make planning interactive and include testing.

The planning phase needs to be highly interactive. You're not just writing a markdown plan, you're also writing the tests. These are tightly coupled. The plan describes what you're building and why. The tests describe what success looks like. Both should be editable. Both should require human sign-off before execution starts.

So the agent would write tests in a test file and the plan in a plan file, and you'd iterate on both until you're happy. Only then does execution begin.

2. Separate prompts for planning and execution.

The planning agent and the execution agent need different prompts. The planning agent's job is to understand the codebase and design a good solution. It needs context about architecture, patterns, and constraints.

The execution agent's job is different. These models are already heavily tuned to write code. You don't need to tell them to write code. What you need is guidance on how to write code without adding slop. Don't introduce tech debt. Don't add unnecessary abstractions. Don't break existing patterns. Keep it clean.

That's a fundamentally different prompt than "understand this complex system and figure out what to build."

3. Different models for each phase.

Use Codex or Opus or whatever's smartest for planning and tests. Use Sonnet or Haiku or whatever's fastest for execution. Match the model to the task.

The Planning Gap

Right now, the planning phase in most agents is underdeveloped. It's something you can turn on, but it's not the default workflow. It's not deeply integrated with testing. It's not designed for iteration.

But this is where the leverage is. If you nail the plan and the tests, execution becomes almost trivial. If you skip planning or phone it in, you're asking a floor model to do ceiling work, and you'll get floor results.

The whole point of TDD is that once you've agreed on the tests and acceptance criteria, the implementation is just... implementation. "However you solve this is up to you, I don't care. Just make it pass." That's a fundamentally different kind of task than "figure out what to build."

Most agent workflows don't acknowledge this distinction. They should.


Related posts: