Skip to content

2025

I Hate Making Slideshows

I hate making slideshows. It may or may not have something to do with how bad I am at making them.

Unfortunately, AI has not made this any easier.

So I decided to try my hand at building something better.

What's Out There

ChatGPT spits out plain text wrapped in .pptx files. Claude's new native slideshow maker produces boring HTML with cookie-cutter colors and zero personality. Both are technically PowerPoints, sure, but they aren't getting us 80% of the way.

The core problem is that PowerPoint generation requires tons of boilerplate. By the time the model sets up the file structure, it's out of tokens and creative capacity.

Beautiful Slide

My first thought was to build a workflow where the agent creates a detailed presentation plan, then builds slides based on that plan. This isn't a terrible idea, but it kinda kicks the can down the road to the user to properly design a slideshow using only text. It also doesn't solve the core problem, which is the ability to abstract slide designs into something text based so that an LLM could design them in the first place.


Next, I considered making my own JSON-based slide description schema. I could design a structured output schema that maps to certain slide components and design elements and then try to get an LLM to adhere to that schema. JSON would be tough though because its pretty limited and I would ultimately be building a new programming language for slide design that was based on JSON. Which triggered the next thought, is there already a programming language for designing beautiful slides?

There is! Theres actually a few of them but the one I landed on was called Slidev.

Slidev is a markdown-based syntax for creating presentations. You write markdown, it generates beautiful interactive slideshows. It's open source, and has components and themes from the community. It digests Vue components, HTML, CSS, Mermaid diagrams, click transitions, and PDF/PPT/PNG exports.

I tested ChatGPT, Claude, and Grok with prompts requesting slideshows in Slidev syntax. ChatGPT made boring but functional slides again. Claude was more ambitious but had small, fixable syntax errors. Grok had pretty bad syntax errors I didn't spend the time fixing. But the models could kind of handle the syntax since it's similar to markdown—they just weren't great at Slidev-specific features.

I installed Slidev locally and set up a quick-start template with Claude Code to try it out. On the first shot, I got similar results to the web Claude attempt: syntax errors and a boring slideshow

Setting Up the Feedback Loop

I am a huge fan of providing coding agents with a feedback loop. I figured if the AI could write slides, export them, see the results, and iterate, it would catch its own mistakes.

The plan was:

  1. Write slideshow in Slidev syntax
  2. Export to PDF
  3. Review the PDF
  4. Fix issues and re-export

The export build command would fail due to syntax errors. Claude would fix them, re-export, find more errors, and keep iterating until it worked. But when reviewing the PDF, it claimed everything looked fine even when there was raw HTML rendering instead of proper components.

Claude was just congratulating itself on a job well done, when there was a lot more work to be done.

Claude Complimenting Itself

It was missing obvious issues that were not obvious in the code. Empty lines between divs caused HTML not to render, so there was raw HTML in the slideshow. Some slides were blank. A lot of content overflowed and was cut off. Claude also had the tendency to use white text on a white/pastel background, which was not readable. But Claude was not seeing any of these issues, and when I pointed them out, I got a swift "You're absolutely right!".

-_-

I attempted to give Codex CLI a shot, but it was not able to read PDFs natively and attempted to extract the text, which is not helpful.

So the next hill to climb was how to solve the slide review problem. I made the guess that Claude's PDF abilities probably just treat the PDF as a single image in a long vertical PDF style. I bet it would be more conducive to review if Claude Code could review each slide individually. So I tried it out. I took a screenshot of a slide and popped it into the Claude and ChatGPT web apps and asked it to give some design feedback. They nailed it! They called out the HTML and the white text on a white/pastel background issues and also noticed some formatting issues I hadn't noticed.

We found a new path forward!

Switching to Images

The first step was to switch to exporting each slide as a separate image. Luckily, Slidev has export command args that allow this out-of-the-box. It generates a folder with a PNG for each slide labeled as {slide-number}.png.

I tested Claude Code with the new images and it was working as expected, but this also allowed me to try Codex CLI again, since we were now dealing with images instead of a PDF.

This worked better. It could spot white text on pastel backgrounds and broken layouts. But Claude was ...lazy. When I asked it to review all 11 slides systematically, it would check slides 1, 2, then skip to 6, 8, 10. It tried to trick me and take a shortcut but luckily I caught it because I don't trust Claude. Sneaky little bastard.

I tried Codex, which supposedly handles longer tasks better. It ran for 30 minutes before I stopped it. After 25 minutes of "processing" with no progress updates, it finally made edits that were worse than the original and also contained errors. It wanted to fix the errors, export and review again but I just killed it. Not waiting another hour for it to finally finish.

So if we have to stick with Claude Code, what are our options? I was thinking about setting up a custom script that passes the image to a multi-modal model and produces a review of the slide that Claude code could use. Then Claude would just have to run the script and then fix the results. But I kinda just don't want to build all of that for this project. I dont want to add API keys and other dependencies. I'm hoping that anyone would be able to jump into this repo, start up Claude Code, and start building slideshows.

We just needed to solve the laziness issue from Claude. Ideally, it wouldn't go 1 by 1, but in parallel. I also don't like the idea of the agent that built the slides being the one to review them, because it introduces bias and conflicts of interest.

Enter subagents.

Subagents

Claude Subagent seemed to be the perfect fit.

  • Uses its own system prompt
  • Isolated context prevents conflicts of interest
  • Can be run in parallel
  • Can be easily delegated to and reviewed by the main agent

I used the /agents CLI command to spin up an Image Review Subagent. Claude Code actually made this step really easy. I just described the challenge and the goal and it wrote the system prompt and everything for me. I had to do some final tweaks but it ended up looking great.

So now, instead of having the main Claude agent review its own work, it would spin up an independent review agent for each slide in parallel. These had fresh context and no attachment to the original design. They'd critique things like white text on a white/pastel background, broken layouts, and more. They caught even more issues I hadn't noticed.

Review Agent In Action

I did a little bit more tweaking of the CLAUDE.md and the subagent prompt before it got to a place that I felt comfortable.

I iterated more on the original Evals presentation I was using as an example before starting from scratch.

It's Alive!

I popped in my blog post about Complex vs Simple Agent Architectures and asked it to build a beautiful slideshow about it. It did pretty great! It even included a mermaid diagram and image placeholders for me!

I published the unedited slideshow results in case you want to see the results for yourself. It's not perfect but it's much better than what ChatGPT or Claude's native tools produce. With some iteration I'm sure I could get it closer to 90% of the way there!


Takeaways

1. Find the right abstraction The problem with AI-generated content isn't always model capability—sometimes it's finding the right harness/abstraction. Slidev gave me a syntax that was LLM-able. No need for MCPs or tools or workflows or any of that headache.

2. Feedback loops are essential Your agent is in "spray-and-pray" mode if you don't give it a way to review its own work.

3. Subjectivity matters Vision models can see your design but do they look for what you look for? Do they have good taste?

4. Sometimes multi-agent works The irony of the solution being multi-agent, after writing a blog trashing multi-agent systems, is not lost on me. I am not totally against multi-agent systems, but I think the right tool for the right job is more important here.

Next Steps

I plan to revisit this repo in the future as more improvements are made to Claude Code, Codex, Slidev, and the models are released. I have an AGENTS.md in place just in case Codex wants to start being good.

I'd also like to make it more multi-tenant so you can build multiple slideshows per repo.

I'd also like to see how far I can really push the design skills of these models. No more purple gradients!

Want to Make Your Own Slideshows?

The repo is public here. Feel free to fork it and start building your own slideshows. It's a template repo so you can copy it and start building your own slideshows.

Quick Start Guide
  1. Fork/Clone the repo
  2. Install the dependencies with npm install
  3. Boot up Claude Code
  4. Prompt it to build a slideshow with whatever content you want

Good luck out there! Reach out if you have any questions or need help!

Complex AI Agents

Model Mafia

In the world of AI dev, there’s a lot of excitement around multi-agent frameworks—swarms, supervisors, crews, committees, and all the buzzwords that come with them. These systems promise to break down complex tasks into manageable pieces, delegating work to specialized agents that plan, execute, and summarize on your behalf. Picture this: you hand a task to a “supervisor” agent, it spins up a team of smaller agents to tackle subtasks, and then another agent compiles the results into a neat little package. It’s a beautiful vision, almost like a corporate hierarchy with you at the helm. And right now, these architectures and their frameworks are undeniably cool. They’re also solving real problems as benchmarks show that iterative, multi-step workflows can significantly boost performance over single-model approaches.

But these frameworks are a temporary fix, a clever workaround for the limitations of today’s AI models. As models get smarter, faster, and more capable, the need for this intricate scaffolding will fade. We’re building hammers and hunting for nails, when the truth is that the nail (the problem itself) might not even exist in a year. Let me explain why.

Where Are All the Swarms?

Complex agent architectures are brittle. Every step in the process—every agent, every handoff—introduces a potential failure point. Unlike traditional software, where errors can often be isolated and debugged, AI workflows compound mistakes exponentially. If one agent misinterprets a task or hallucinates a detail, the downstream results may not be trustworthy. The more nodes in your graph, the higher the odds of something going wrong. That’s why, despite all the hype, we rarely see swarm-based products thriving in production. They’re high-latency, fragile, and tough to maintain.

Let's use software development as an example, since it is what I am most familiar with. Today’s agent workflows often look like this: a search/re-ranking agent scours your code repo for relevant files to include in the context window, a smart planning agent comes up with the approach and breaks it into tasks, a (or multiple) coding agent writes the code, a testing agent writes the tests, and a PR agent submits the pull request (maybe with a PR review agent thrown in for good measure). It’s a slick assembly line, but every step exists because current models can’t handle the whole job alone.

  • Search and re-ranking: This is only necessary because context windows are too small and it is too expensive to ingest an entire repo. This is also the step that is most susceptible to failures, because model that is smart enough to plan the task should also be the one deciding which files are relevant. A context window increase and a price decrease will make this step obsolete.
  • Planning and task breakdown: The main value of this step is that you can have your smartest model give direction to the smaller, less capable, but cheaper and faster models. There's no need for a formalized plan when models can perform all planning inside of their own reasoning process. The only other reason I can think of to have subtasks here would be because a models won't be able to output enough tokens to solve the entire problem in 1 go. An output token limit increase and price decrease will make this step obsolete.
  • Testing and PRs: Why separate these? A model that's capable of planning is capable of writing the code to test that plan as long as it fits inside of the output token limit. This step would be replaced by simply returning the test results to the single agent so that it could make decisions based on the results. This is feasible today! But it could be pretty expensive to have an agent loop with the entire codebase as context.

The root issue isn’t the workflow, and in most cases, it's not even the model intelligence. Limited context windows, high-priced top-tier models, and token output caps force us to chop tasks into bite-sized pieces. But what happens when those limits start to fade? Imagine even a modest 3x-5x improvement in context window size, price, and output token limits. Suddenly, you don’t need all of your tools, frameworks, and subagents.

Tech Debt

And those constraints are eroding fast. Last year, OpenAI’s Assistant API launched with built-in RAG, web search, and conversation memory. It didn't gain a ton of traction for RAG—mostly because RAG is not really a one-size-fits-all solution and devs needed control over their pipelines. Back then, RAG was an exact science: tiny context windows, dumb and expensive models, and high hallucination risks meant you had to fine-tune your RAG pipeline obsessively to get good results. Nowadays that stuff is much less of an issue. Chunking strategy? Throw in a whole document, and let the model sort it out. Top K? F*#% it, make it 20 since prices dropped last month. Bigger context windows, lower prices, caching, and better models have made simplicity king again. Problems I’ve wrestled with in my own agents sometimes vanish overnight with a model update. That’s not an edge case; it’s a pattern.

The Shelf Life of Agent Architectures

Complex agent architectures don’t last. If you build a six-step swarm today, a single model update could obsolete 3 of those steps by year’s end, then what? AI isn’t like traditional software, where architectures endure for decades. Six months in AI is an eternity—updates hit fast, and they hit hard. Why sink time perfecting fickle but beautiful multi-agent masterpieces when the next AI lab release might collapse it into a single prompt? LangChain, Crew, Swarm—all these tools are racing against a convergence point where raw model power outstrips their utility.

I’m not saying agent architectures are useless now—they’re critical for squeezing the most out of today’s tech. But they’re not evergreen. Simplicity is the smarter bet. Lean on the optimism that models will improve (they will), and design systems that don’t overcommit to brittle complexity. In my experience, the best architecture is the one that solves the problem with the fewest moving parts—especially when the parts you’re replacing get smarter every day.

The "Idea Guy" Delusion: Why No One Is Safe from AI

Knowledge Workers As AI continues to evolve, many professionals (especially software developers like myself) are coming to terms with the reality that their jobs will eventually be automated. Maybe in two years, maybe in five. But it’s happening.

Yet, amidst this shift, a certain group seems oddly confident in their immunity to AI-driven disruption: the idea guys.

These are the people who believe that once AI automates programming and other forms of technical labor, the true value will shift to those who can generate great ideas. But I don’t buy it. Sure, there’s a timeline where this could be true. But in most cases, the idea guy is just as doomed as the software developer, if not more so.

AI Won't Struggle with Ideas

There's a misconception that while AI might be able to code, it won’t be able to come up with good ideas. But this doesn't hold up under scrutiny. Idea generation isn’t some mystical human trait, it’s just a research problem.

If I wanted to generate 15 startup ideas right now, I wouldn’t meditate in a cabin and wait for inspiration. I’d scroll Reddit for 20 minutes and see what people are complaining about. AI can do that faster, better, and across a wider range of sources.

And filtering good ideas? That’s not some sacred human skill either. A good idea guy isn’t someone who magically comes up with better ideas; it’s someone who avoids bad ideas. But AI doesn’t need a filter, since it can pursue every idea in parallel. If it launches 10 projects and one succeeds, is it a genius idea guy?

AI as CEO

AI isn’t just stopping at coding. Software development isn’t just writing code! It's provisioning environments, debugging, testing, scaling, deploying, architecting, and integrating systems. AI is already creeping into these domains, and eventually, it will handle them in ways that don’t require human oversight.

At that point, what’s stopping AI from also iterating on product-market fit? If it can build a full-stack application, why wouldn’t it also build in user feedback loops, run A/B tests, and continuously optimize the product itself? If it can automate deployment, it can automate iteration. If it can iterate, it can validate its own ideas.

Eventually, users themselves will be the ones proposing ideas by leaving feedback, which the AI will then solve for. At that point, what exactly does the human “idea guy” contribute?

But What About Sales and Marketing?

There’s another flawed assumption that AI can build, but it won’t be able to sell. That’s just false. The same AI that can launch products can also launch A/B-tested marketing campaigns, generate optimized ad copy, and personalize sales pitches at a scale humans can’t compete with. Marketers are already prompting AI to generate content, optimize ads, and personalize sales pitches. How far away are we from automating the prompting?

And it’s not just about generative AI—classic machine learning is already better than humans at optimizing recommendations, ads, and conversion rates. These models will only improve. When that happens, an AI-driven product won’t just sell itself—it will continuously optimize its sales approach better than any human could.

Who Actually Survives?

If anyone has a shot at surviving, it’s not the idea guy. Potentially, it’s the entrepreneur who becomes an intern for the AI.

Someone will still be needed to rig up AI systems, configure automations, and handle anything in the physical world—incorporating businesses, making legal decisions, or doing things that require human interaction. But beyond that? Their role will be minimal.

If we ever reach the point where AI can handle full unsupervised software development, then no job is safe. Not developers, not marketers, not CEOs. Not even scientists, doctors, or lawyers. Because an AI that can reason through the entire software lifecycle without human intervention is smart enough to disrupt every knowledge-based profession. In the way that mathemeticians are not safe even though LLMs are bad at math, because code allows them to make extremely difficult calculations, the same will be true for every knowledge-based profession.

Final Thoughts: No One Is Safe

I don’t feel secure in my role as a software developer. But I don’t think idea guys should feel secure, either. If we ever reach the point where AI is developing software without supervision, it will be smart enough to do much more than just code.

At that point, every knowledge worker is at risk—lawyers, scientists, doctors, and executives included. If AI is smart enough to replace programmers, it’s smart enough to replace idea guys, too. And if you’re betting on the latter being the safer role, you’re in for a rude awakening.

Do First, Optimize Later: Breaking the Cycle of Over-Optimization

I've come to a realization: I spend too much time planning and optimizing rather than actually doing. AI and automation have fueled my obsession with optimization, making me believe that if I refine a system enough, I’ll be more productive. But the truth is, optimization is only valuable when applied to something that already exists.

The problem is, I often optimize before I start. I think, “I need to make a to-do list,” but instead of actually making one and using it, I get lost in finding the best way to structure a to-do list, the best app, or the best workflow. Even right now, instead of writing down what I need to do, I’m writing a blog post about how I should be writing things down. This is the exact loop I need to escape.

Optimization feels like progress. It gives me the illusion that I’m working towards something, but in reality, I’m just postponing action. The efficiency of a to-do list doesn’t matter if I’m not using one. The best UX for adding tasks doesn’t matter if I never add tasks. The friction in a system isn’t relevant if I’m not engaging with the system at all.

The real issue isn’t inefficiency—it’s a lack of discipline. I tell myself I’m not doing things because the process isn’t optimized enough, but the truth is simpler: I just haven’t done them. My focus should be on building the habit of doing, not perfecting the process before I even begin.

The New Rule: Action Before Optimization

Going forward, I want to adopt a new mindset—do first, optimize later. If I find that something is difficult or inefficient while actively doing it, then I can optimize. But I won’t let optimization be the barrier to starting in the first place.

I’ll collect real data from actually engaging in the tasks I want to improve. If my to-do list system feels clunky after I’ve been using it consistently, then I’ll refine it. If I struggle to keep up with a workflow, then I’ll tweak it. But I won’t waste time optimizing something that isn’t even in effect yet.

Optimization should be a tool for improvement, not an excuse for inaction. The first step is always to start. Only then does optimization become valuable.