Skip to content

Ship SDKs, not MCPs

Ship SDKs header

If you're running a SaaS, you're probably thinking about how AI agents will interact with your product. The current hype cycle says build an MCP server. If you're a little more tapped in, you've probably moved on to shipping a CLI, like Google's Workspace agent tools. Both are fine. But I think the real play is the SDK.

Context bloat

MCPs stuff all their tool definitions into the agent's context window. Tokens spent describing your API before the agent starts working. Tool search patterns from Anthropic and Codex help with this, and CLIs get it naturally (list --help only shows the docs for list). But in practice all three approaches work better with a skill on top: an agent instruction document that teaches the agent what tools exist, when to use them, and what patterns to follow. An SDK skill is example code instead of tool names or CLI flags, but the concept is the same. This part is mostly a wash.

Scripts beat loops

ReAct vs Programmatic tool calling

The real reason to prefer SDKs is programmatic tool calling (Cloudflare calls it code mode).

Standard agent behavior is a ReAct loop: call a tool, read the result, think, call another tool, think again. Every tool call is a full round trip through the model. I built an eval comparing ReAct to programmatic tool calling: 96% vs 66% accuracy on the same model, fewer errors, lower latency.

With programmatic tool calling, the agent writes a script that handles the entire task in one shot. Consider: find all emails mentioning "Friday" and tag them.

With an MCP or CLI, the agent has to:

  • Search all emails, handling pagination across multiple tool calls
  • Read every result that comes back (this isn't GraphQL; you don't control what's returned)
  • Invoke the label tool one at a time per match
  • Every step is a round trip through the model

The agent doesn't actually need to see the emails for this task. It only sees them because the MCP is the only primitive it has. With a script:

emails = gmail.search(query="Friday", page_size=100)

for page in emails:
    for email in page.items:
        if "friday" in email.subject.lower() or "friday" in email.body.lower():
            gmail.labels.apply(email_id=email.id, label="Friday")

print("done")

One generation, one execution. And you get things MCPs can't give you: regex filtering, additional libraries, real control flow.

An SDK is a library, designed to be called from code. MCPs and CLIs were designed to be called from conversation turns.

Code as documentation

When your skill includes example code, it teaches the agent the SDK and demonstrates the workflow patterns you expect. A big example script covering ten use cases? The agent pulls out parts B and C and writes its own version. With MCPs, your skill has to describe workflows in prose: "first call this tool, then call that tool, then check this field." Code is a better medium for expressing workflows than English.

Scripts can also be persisted. The agent saves a search script, saves a labeling script, composes them for new tasks next week. Each solved problem leaves behind a reusable artifact. MCP interactions are stateless; every session starts from scratch.

The trade-offs

The biggest practical issue is permissions. CLIs and MCPs let you set granular approvals per command or per tool. With scripts, you're approving or declining the whole thing. Anthropic's programmatic tool calling API solves this at the API level by pausing the script at each function call and emitting it as a tool call you can approve or deny. But in something like Claude Code, where the agent writes scripts and executes them, you're still approving the whole script at once.

The other issue is interactivity. If I want my agent to open my browser, look at Gmail, and act on what it finds, that's inherently reactive. Writing a script assumes you know the state of the world in advance. Browser automation, exploratory analysis, debugging: these belong in a ReAct loop.

Ship both

The practical recommendation is: ship the SDK with a skill, and also expose the same functionality through a CLI.

The skill tells the agent when to use each. For quick, interactive, low-stakes actions (fetch a Jira ticket, check a status), the CLI works well. The agent runs a command, gets the result, moves on. For anything that involves multiple steps, data processing, or bulk operations, the skill directs the agent to write a script using the SDK.

You needed a good SDK to build the CLI anyway. The SDK is the foundation; the CLI is a surface on top of it. Might as well expose both and let the agent pick the right tool for the job.