Your AI Agent Doesn't Need To Do Everything. In Fact, It Shouldn't.

Founders describing their AI agent on the first call tend to give it a huge job. It will moderate content, summarise documents, draft follow-ups, update the CRM. One agent, one brain, one prompt. It looks efficient in a diagram. In production it tends to be the opposite.

A monolith at sunset

This is not a new question. It is the scope question that software teams have been getting wrong for decades, now wearing an AI label. The lessons from before AI still apply. Here is what they look like in practice.

The pattern is older than AI

I joined Zintouch in December 2019. The company had existed for about five years, with a handful of launching customers and a working product. The platform was a single large PHP application, one database, one deployment. The CTO had built it that way for a good reason: it was fast to make, and fast was what those first customers needed.

It was the right call for that stage. It became the wrong shape for the next one.

From 2020 onwards, we spent the first two to three years decomposing that monolith into domains, layers and components, with Docker added so we could run real microservices. One concrete example of why: supporting more than one type of hardware. In the monolith, every new hardware variant meant changes that rippled through the whole codebase. The work cost far more than it should have. Without the decomposition, we would not have been able to support multiple hardware types at all, and we would not have been able to deliver life-critical care alarms reliably as the device count grew. The platform went from hundreds of devices to thousands. The company grew from six people to over fifty.

The monolith was not a mistake. It was the right scope for one phase and the wrong scope for the next. The job of an architect is to notice when the phase has changed.

The AI agent debate is exactly this argument, with new vocabulary. The temptation is to put everything inside one agent because that is the fastest way to a demo. It is also the shape that breaks first.

What narrow scope looks like today

At PAM Connect, the platform I'm building now, content moderation is not one AI call. It runs as separate FastAPI microservices, each with a single specialised model:

A text classifier running KoalaAI's Text-Moderation (DeBERTa-v3, around 100M parameters) for hate speech, violence and sexual content scoring.
An NSFW image classifier running Marqo's nsfw-image-detection-384, a 15MB ViT-Tiny that does one job well.
A violence image classifier running jaranohaal/vit-base-violence-detection. Same pattern, different concern.

Three services, three Docker containers, three model caches. The backend calls them. The combination logic is plain Python: the strictest verdict wins. There is no large language model in the moderation path at all. No prompt engineering. No "act as a content moderator" instruction. Just narrow classifiers trained for the kind of data they actually see, running CPU-only.

The same shape shows up in my open source claude-code-toolkit. Instead of one general-purpose AI agent for development work, the toolkit ships specialised subagents with explicit tool grants. The infra-maintainer agent has Read, Grep, Glob and Bash. No Write. No Edit. Not by accident, but by configuration. Its job is to analyse and advise on infrastructure, never to modify it directly. A separate agent, devops-automator, has the write permissions, and all changes go through the deployment pipeline. Two narrow agents with different jobs and different access, instead of one agent that can do both.

Why narrow wins in production

A narrow component has a job description short enough to fit on a sticky note. That sounds trivial. It is not.

You can write the prompt clearly, or skip the prompt entirely. A 200-word prompt for one task outperforms a 2,000-word prompt for ten. Better still: for many narrow tasks you don't need an LLM at all. A 15MB classifier is faster, cheaper and more predictable than asking a general model to do the same job.

You can test it. Give it 50 example inputs and check the outputs. Track the false positive rate. Track the false negative rate. When the rates move, you know which component to look at.

You can give it the minimum access it needs. The text classifier reads a string and returns scores. It has no database access, no file system access, no API keys to anything else. The infra-maintainer agent can read your servers but cannot change them. If a component goes wrong, it cannot destroy data it never touched.

You can replace it. When a better NSFW model ships, you swap that one container. Nothing else moves. Narrow components are interchangeable parts. A PHP monolith is not.

You can explain it. To users, to your team, to a security auditor. "This service runs Marqo's NSFW classifier. Here is its accuracy on a 20,000-image test set. Here is the threshold for auto-reject." Try saying that about an agent whose job is "handle everything".

What this means if you're starting now

If you're a founder briefing a developer on an AI feature, the question is not how powerful you can make the agent. The question is how small you can make each piece, and how you will keep the pieces independent enough to change later.

Start with one specific task. Pick the right tool for it, and the right tool is often not a large language model. For classification, use a classifier. For embedding lookup, use a vector index. Reach for an LLM where you actually need generation or reasoning. Wrap each component in code that validates its output. Get the failure rate to something you can live with. Then add the next task as a separate component.

The fast path is the monolith. It always is, in any era of software. It always reaches its ceiling. The work that actually scales is the work of keeping things separable, so that when the phase changes, and it will, the next phase doesn't require rebuilding from scratch.

Samenvatting in het Nederlands: founders bouwen hun AI-feature vaak als één agent die alles doet. Dat is dezelfde scope-fout die softwareteams al decennia maken. Bij Zintouch heb ik vanaf 2020 een PHP-monoliet helpen opdelen in componenten. Dat was de voorwaarde voor de groei van honderden naar duizenden apparaten en van zes naar vijftig medewerkers. Bij PAM Connect is content-moderatie nu drie aparte microservices met gespecialiseerde modellen (KoalaAI tekst, Marqo NSFW, jaranohaal geweld). Geen LLM in het moderatie-pad. In mijn open source claude-code-toolkit zijn de subagents net zo smal: infra-maintainer is read-only by configuration, devops-automator heeft write-rechten omdat alle wijzigingen via de pipeline moeten. Smalle componenten zijn testbaar, vervangbaar, en uitlegbaar. De vraag is niet hoe krachtig je het geheel kunt maken, maar hoe klein je elk stuk kunt houden.

Jan Keijzer is owner of Imperial Automation and a freelance Senior Software Engineer. He builds products from zero for founders who have an idea but no technical partner. Python, FastAPI, LLM workflows, AI-augmented development. PhD Nuclear Reactor Physics, TU Delft.