The Year of the AI PC

AI PC

2025 was supposed to be "the year of the agents". We saw real agent use cases being pushed to production at enterprises and startups and actually being useful. These are usually very simple tool-loop agents that devs plug into APIs, allowing LLMs to use tools to fetch info (RAG) or to take actions. A ton of agents popped up in 2025, but not a ton of great ones. You would think this was due to model capabilities, but what Claude Code taught us is that the harness, or the architecture of the agent, is just as important as the model, if not more.

If you haven't been using Claude Code, I highly recommend you give it a try, even if you're not a programmer. It's magical.

It's so magical, in fact, that I think 2026 will largely be defined by the innovations coming from the Claude Code team. I think George Hotz sums it up perfectly in his blog about long-awaited models that are able to use computers:

"Turns out the idea wasn't a desktop emulator with a keyboard and mouse, it was just a command line."

It's so obvious now that agents would master the terminal before mastering the web browser/Windows/MacOS. What that means practically is that we have agents that can do just about anything that can be done on a computer through a terminal, which is actually a lot more than you might think.

The AI PC

Under the hood, Claude Code still uses the traditional tool-loop, but the tools are all developer tools. So it can not only read and write files, but it can run terminal commands. It turns out that like 90% of computer tasks can be achieved with these 2 actions.

Idk if non-developers can really grasp how powerful this can be. MCPs saw an explosion in 2025. MCPs are little toys compared to the capabilities of an agent with access to a codebase. It's the difference between letting your agent access your Jira, versus your agent being able to build you a custom app that consumes your Jira, calendar, email, and todo list in one. For developers, it's like the difference between only being able to hit APIs via curl versus having a persistent codebase where you can install libs, write reusable scripts, store env vars, and spin up multiple terminals.

This is when things get fun.

Imagine a personal homepage, like Notion, but that was fully customizable and controllable by an agent. Under the hood, it would just be Claude Code on a server. Some examples:

Todo list: The agent manages a file in the codebase and adds a clickable link on your personal homepage. You can interact with the list, add todos and check things off, but so can your agent.
Calendar widget: Tell the agent you want one. It installs the calendar libraries, finds a nice React component, and adds it to your homepage.
Email/Sentry/Analytics analysis: It installs the SDKs and writes custom scripts. Next week, when you ask for the same analysis, it reuses those scripts. Ask for something similar, and it already has the plumbing from last time.

Your agent will be able to maintain a "state" by maintaining a file system based on your needs.

This will start to look vaguely like continual learning, or the concept of AI that learns from experience. With Agent Skills your agent can write down everything they "learned" during a session and remember it for the next interaction. Picture this applied to tasks like managing your emails, scheduling meetings, or any other task you wish your agent would "learn" to do better:

flowchart LR
    A[New Task] --> B{Skill Exist?}
    B -->|Yes| C[Use Existing Setup]
    B -->|No| D[Find Library]
    D --> E[Install]
    E --> F[Read Docs]
    F --> G[Write Scripts]
    G <--> H[Review Results]
    H --> I[Document Skill]
    I -.-> B

I also bet that this will have a lot more utility in enterprise than personal use. Agents should be onboarded into the same environment as the rest of the team. The personal homepage would instead be an enterprise dashboard that is shared with the team.

Some people are already hacking this kind of thing together. I think Zo took this approach, Poke has early signs of the AI PC, and McKay Wrigley talks about using Claude Code in Obsidian.

I honestly wouldn't build this unless your name starts with a "D" and rhymes with "Mario Amodei".

If you were to build this, I would suggest using the Claude Agent SDK, which powers Claude Code, and then setting up the infra around it. I would also hurry up because Anthropic is clearly working towards something like this already.

Agent SDKs

Most agents we use today are simple tool-loop agents. We load all of the tools (and/or MCPs) into the context window and expect the agent to use the right ones in the right way. This turned out to be a terrible strategy for a ton of reasons, mainly centered around agent performance degrading when there's too much irrelevant junk in the context. The answer to that is a concept that's still struggling to find a name, but we can call it "harness engineering" for now. I'd sum it up as optimizing context as much as possible before invoking the LLM and asking as little of the LLM as possible.

So the goal of harness engineering is to make sure everything relevant, and nothing irrelevant makes it into the agent's context. Think unused tools/MCPs, unused instructions, unused search results, etc. Building a competitive agent in 2026 will require that you take these measures. Unless you're building an extremely simple agent with just a few, simple tools, you will need to do some form of harness engineering. Some strong primitives for this are:

Agent Skills - By Anthropic
Tool Search Tool - By Anthropic
Programmatic Tool Calling (My Favorite) - By Anthropic
Sub-Agents - Not invented by Anthropic, but perfected by Anthropic*

I'm not gonna say that Anthropic has a monopoly on harness engineering. But if I had to build the best harness I could, combining 3 years as a Senior AI Engineer (whatever that is) with the latest research and primitives, I would just end up rebuilding the Claude Agent SDK.

It will make less and less sense in the future to build a custom harness vs using an Agent SDK, analogous to rolling your own frontend framework instead of using React. Afaik, the Agent SDK only supports Anthropic models, so there is an opportunity for the LangChains of the world to build some kind of universal harness, but until then, Anthropic has the market cornered.

GenUI, or the concept of dynamic GUIs that update based on the agent's steps and outputs, will be huge in 2026. A good harness outputs a ton of useful data throughout the process like: What tools have been used, their results, reasoning, to-do lists, planning docs, subagents, questions for the user, and a lot more. There's so much low hanging fruit around the UI that will house these agents. The "Jarvis" UI is coming!

More Tests!

Vibe-coding's last bottleneck is good testing. The biggest leap in coding agent utility in 2025 was giving them feedback loops. The simplest version of this is to ask the agent to not just make a fix, but to test that the fix was successful.

Testing

It's obvious why this is a good thing but the effects this has on dev are not obvious:

Testing allows the agent to try multiple things to solve the problem, only stopping when they have successfully completed the task. This is where the rise in long-running tasks came from.
Tests abstract the solution away from the code. This is the most important one imo. If I can decide on exact expected behaviors, and a way to verify that those behaviors are happening, then I don't really care what the underlying code looks like. It could be "slop" in the worst case scenarios, but it doesn't matter because it's working well! High-level languages abstract away machine code. In 2026, tests will abstract away the code itself.

Right now, a non-developer with Claude Code could probably get away with a very, very high quality app by simply vibe-coding and then testing their app out a lot before shipping it. Manual UAT is not easy and not fun.

That's why I see a big rise in more agent focused testing frameworks. Testing backend changes is pretty straightforward, but UI design and end-to-end functionality are still not very easy for agents to solve. All of the buttons might work, but if the buttons show up in a strange place, your agent won't notice that and tests will pass.

Total Skills Takeover

I don't want this to turn into a Skills appreciation post, but suffice to say, Skills will take over.

There will be an update to the primitive that allows for remote/filepath Skills, which will unlock even more use cases. Imagine if all of the libraries installed in your project came with a doc for how your agent should use them, and the agent could read that doc whenever it needs help. The doc would be pinned to the installed version, so your agent is always using the installed version correctly.

MCPs could still be useful, but they should be hidden behind Skills so that they are dynamically fetched into the context only when they're needed. But honestly an API doc and and curl could replace MCPs entirely.

Lastly, skills have proven to actually counter slop and improve performance. The Claude front end design skill instructs the agent to avoid building generic AI slop UIs. When OpenAI announced an improvement on building spreadsheets in the jump from GPT-5.1 to GPT-5.2, this was at least partially due to them quietly implementing spreadsheet skills.

With Skills, we don't need to train an LLM to know how to use Excel, Word, and Powerpoint at the same time, because it can learn those skills on the fly. We won't need to wait for labs to improve models in certain areas, or update the knowledge cutoff dates.

Turing Test v2

The original version of the Turing Test came and went with the release of GPT-4. The average person who didn't know the subtleties of how LLMs tend to respond would have no idea they were not talking to a human. Then in 2025, our senses adjusted, and it seems like people can tell now when things were made by a human or not.

So the 2nd version of the Turing Test will be to fool high-taste testers. In 2026, we will cross the uncanny valley in domains like writing, image generation, and code. Even the people that are sensitive to AI generated content will not be able to tell the difference. The clearest example is image generation. I consider myself very tapped in, and I already can't tell if some images are AI or not.

This will be received differently across different mediums. I would not be opposed to reading AI generated writing. Gwern has been experimenting with AI poetry for a while now with great results. If it's good writing, I'll read it. A voice note from someone I admire, turned into a blog post by AI, would still be worth reading to me. Now I'm not sure how I'd feel about watching AI videos for entertainment, or reading social media comments written by AI.

But the point is that you won't need to make those decisions by the end of 2026, since you won't be able to tell the difference.

Conclusion

2026 will be the year we put up or shut up. The funding that poured into AI in 2025 was absurd, but outside of dev tools, it produced very few agents worth using everyday. Claude Code is the exception. It's so useful that it almost justifies the entire hype cycle on its own.

A ton of agent startups will be steamrolled by the AI PC. The winners in 2026 will be the ones who strap a great harness to a great model and build a great UI on top of it. The losers will still be stuffing 50 tools into the system prompt or even worse... going multi-agent.