What exactly are AI Agents?

Breaking down where we are with AI Agents and what's coming in 2025

Feb 04, 2025

Right now, at the start of 2025, we have two major trends in AI: reasoning and agents. Recently, a lot of the focus has shifted to reasoning models, especially the controversies surrounding Deepseek’s open-source R1 model.

However, I think this is a mistake. Not that the models aren’t incredible on their own, setting new benchmarks for artificial intelligence. It’s more that they are “just” an iteration of what we’ve been seeing for the past two years. Agents are a new paradigm altogether. Getting PhD-level answers in chat is one thing. Having a PhD level coworker is another thing altogether.

But what are agents, exactly? We saw from the press release that Gemini 2.0 was “built for agents”, and after reading my The Death of SaaS post, Salesforce has pivoted to “Agentforce”. Apparently, at Davos, delegates jokingly started referring to this year’s event as “agentpalooza”.

I see people talking about agents in almost every context you can imagine, and it’s starting to sound like a replacement for the word AI itself. So even if just for my own sanity, I wanted to try and formalize a useful framework for how to think about agents today and what to expect in 2025 and beyond.

The differences between Chatbots and Agents

The logical place to start is to draw some level of distinction to the chatbots we’re all familiar with by now. Below is an attempt to visualize how the inner process of an agent might differ from a chatbot.

Notice that in both cases, we start from a user input and end with an AI-generated output. The difference here is in the how, more than the what. They more or less do the same task, but instead of a single forward pass through the underlying LLM and blurting out an answer, the agent breaks down the user’s request into several steps, which may include using tools repeatedly and making some decisions about how best to deliver the optimal response to the user.

A more elaborate definition has been put forth by Huggingface, where they rate the level of agency. Levels 1-3 are what you would see in ChatGPT today.

Introduction to Agents, source: Huggingface

By these definitions, the above agent diagram would qualify as a Multi-step Agent because it can make multiple inner loops and decide when it’s finished. This however doesn’t say anything about how the agents might be used in the real world.

Venture firm NFX has come out with its own five-stage framework for how AI agents might evolve.

The Five Stages of AI Agent Evolution, NFX

For example, the ChatGPT of yesteryear would be seen as a Generalist Chat. The many wrappers built on top of the OpenAI API could be seen as Subject-Matter Experts. Neither really has any agency at all. Step three is where we’re at now, which is to build robust agents that can do stuff for humans. Once you extrapolate what that implies, you quickly get to the last two steps which are innovators and ultimately an entire organization run by AI.

Let me repeat that just so it hits home.

Agents are what takes us from you using AI to AI running your company.

To better understand how that transition is not only possible but inevitable, I thought of three more descriptive and pragmatic categories. The reason I like this framework is that it maps to how you would use these AI systems, and they are complementary.

The end game isn’t that one replaces the other per se, but that all of these will be part of our lives going forward.

Prompt Agents - You ask the AI to complete a task in a chat window.
Workflow Agents - The AI completes tasks semi-autonomously as part of a pre-determined workflow.
Worker Agents - The AI autonomously carries out tasks as part of a role description.

Let me expand and give you some examples of existing and upcoming AI systems that fall into these categories.

Prompt Agents

Almost everything you hear when people talk about agents falls into this category. These are still just chatbots, but they do have some limited agency. The agent is only activated by the user, even if the agent takes some time to complete their task.

Recently, a slew of new product announcements from leading labs fall under this category. These include Claude Computer Use, OpenAI’s Operator, as well as Deep Research in both Google and OpenAI flavors.

Here’s one of the promo videos from OpenAI’s launch of Deep Research, showing a consultant from Bain effectively replacing themselves with AI (source). The irony!

I’ve used Google’s version for a while, and it’s impressive to see how much work the AI is doing from just a simple question, sometimes reading more than 100 web pages in a few minutes to find information, and then putting all that together in a neatly formatted document. Certainly the human equivalent of an afternoon, if not more.

I previously covered Gemini Stream Realtime, which was an eye-opening experience for me. While it has no agency per se, you can so easily imagine how seamlessly the AI will simply take over your mouse and keyboard to just do the things you’re asking it about and it can already see on your screen.

When you put these together, and I’m sure they will, these chatbots will feel pretty powerful in the right hands. It just becomes a question of well, having better questions.

Yet most jobs aren’t about research and producing reports. How does the AI actually interact with other systems and do more stuff?

Workflow Agents

This category is what got me excited about agents in the first place. In the summer of 2024, I saw this video about an intern in Silicon Valley replacing a $16,000 software project in two hours using AI and a simple workflow tool (source). The way to kill SaaS isn’t the chatbot, but by using the LLM APIs powering chatbots to connect systems and data in new and creative ways.

The Klarna case studies which also got a lot of attention in 2024 are examples of this. After replacing customer service with superior chatbots, they supposedly, and quite rapidly, pulled out major enterprise systems including Salesforce and Workday by just integrating OpenAI’s API directly into their business operations. In practice, the AI isn’t replicating Salesforce. It’s creating a much simpler, streamlined process. Like just reading through your email to automatically update leads in a database. Why then pay Salesforce when a free open-source database will do.

This is an important step and one I expect to play out big time in 2025 as AI invades the enterprise stack. Major players, including Salesforce, have their own offerings, and following suit, many incumbents will switch jackets from SaaS to Agents. Expect to be spoonfed their agent-focused marketing at every conference this year.

Still, this isn’t the endgame. Stitching together all these systems is tedious work, and there is a more elegant solution coming.

Worker Agents

One of my big aha moments in 2024 was using Devin. Partially a victim of its own hype, Devin was really one of the first startups to go all-in on agents in early 2024. They took a lot of heat for being that early, that bold, and as expected, the first iteration is never the ultimate expression. But like many things in AI, as the models get better and cheaper, Devin will get better.

What separates Devin from other AI tools you would have used is that it feels different.

Quickstart - Devin Docs — Devin onboarding process, source: Devin

When you start using Devin, there is an onboarding process, instead of an empty chat window. I’ve hired a lot of remote developers over the years, and it feels familiar. All you need to do is invite Devin to your Slack and code repositories. After that, you can ask Devin to do stuff directly in Slack. But I want to emphasize, working with Devin doesn’t feel like ChatGPT. Devin is a specialized model built for this specific scenario, so you feel you can just talk to it instead of having to prompt engineer it. Which makes a big difference. Below is a video from Devin’s creators.

Yes, Devin feels pretty autistic. The responses are overly verbose and its follow-up questions can seem extremely pedantic, but I know a few human developers like that. As you might expect, Devin is still limited in its capabilities. But you might be surprised how far that limit goes. Obviously, Devin can code. But Devin can also access remote servers, configure databases, and help you modernize your legacy architecture. When Devin completes a task, it creates a pull request, which a human can review before accepting. Again, Devin just fits into the way software developers already work.

I’m still exploring those limits, and over time they will outpace my ability to define reasonable tasks. There will come a time when Devin just knows what to do, and at most, I just have to give him a thumbs up on Slack. Just let Devin cook.

This isn’t just technical innovation. Devin also ushered in a new way of pricing AI. At the time, $500 a month was considered astronomically high. But the licensing model goes further. It’s not limited by seats like traditional SaaS models. Everyone can use Devin. After all, Devin is on Slack like your other employees! It’s just that Devin has limited computing time per month. Unlike human employees, you can still run Devin 24/7 as long as your credits last and you’re willing to stay up to give it more tasks. They also offer enterprise plans if you want to run more Devins in parallel. Some impressive things will be created by a team of Devins in 2025.

Concretely, what I expect to see in 2025 is “Devin for everything”. Devin for Sales. Devin for HR. Devin for UX Design. Specialized agents that perform not just individual tasks, but more broadly defined roles that for obvious reasons will resemble typical human roles. Dario Amodei calls these “virtual collaborators”, recently hinting that Anthropic will release something in this area soon.

In fact, all the major AI labs have their own agent programs. Here’s how Sam Altman describes the integration of models he sees in the future for GPT-5, which he also refers to as AGI in this video (source).

So this is potentially how we get to the end of the road for agents and AGI. I would put a reasonable probability that this happens already in 2025. Perhaps as a “baby AGI” initially, stitched together from a variety of AI models and systems as a Frankenstein of sorts. I expect eventually all this will become integrated into the models, much like when we went from text and image models to native multimodal models. That may take some more time to make it all fast and cheap.

So what does this all mean? How is the world different with AI agents zipping around the workplace and the internet?

What AI agents mean for the world

Dario Amodei was careful to clarify that agents will be a “complementary” workforce not “replacement”, which he immediately caveats with the statement that market forces may dictate otherwise.

Narrator: The market forces did, in fact, dictate otherwise.

We see signs of this already. Here’s how Y-Combinator describes what they’re looking for in startups building AI agents in their upcoming batch.

“The value prop of B2B SaaS was to make human workers incrementally more efficient. The value prop of vertical AI agents is to automate the work entirely. Vertical AI agents that reach human-level performance grow extremely quickly.”
— YC Spring 2025 Request for Startups

Automate the work entirely. They said the quiet part aloud.

It will be no different in the enterprise. Here’s Goldman Sachs talking about their very own AI agent program.

“That’s where the model is going to start to do things like a Goldman employee, not only say things like a Goldman employee.” — Marco Argenti, CIO at Goldman Sachs

He then goes on to say things the HR department would approve about the value of humans and all, but we can read between the lines here.

During 2025, we will find out what the famous new jobs are as AI agents permeate the workforce in all shapes and forms replacing humans and existing software systems en masse.

What’s your take on AI agents?

Aki’s Substack

Discussion about this post