The Age of Agents: Building AI Systems that Work in the Real World

Tech giants and startup founders alike are racing to create AI agents that can revolutionize how we work, code, and interact with technology. But behind the glossy demos and viral videos lies a more complex reality: building reliable AI systems that actually work is extraordinarily difficult.

The Hype vs. Reality Gap in AI Agents

When Apple recently scaled back its Apple Intelligence initiative due to “hallucinations”—instances where AI generated fabricated information in summarizations—it highlighted a truth that industry insiders have known for months: the gap between AI demos and production-ready systems remains stubbornly wide.

“Building effective and reliable AI agents is really hard,” says Dave Abar, founder of Data Lumina. “Most of these demos break down when you put them into a product and let a lot of people use it.”

The technology landscape is littered with examples. Amazon’s Alexa continues struggling with unreliable AI-driven responses. Even well-funded startups showcase impressive demos that often crumble under real-world conditions. This pattern has created a growing disconnect between what’s promised and what’s delivered.

I’ve spent the last six months speaking with AI engineers and entrepreneurs building agent systems. The consistent message? Nearly everyone is learning the hard way that there’s a world of difference between a flashy demo that works for a handful of cherry-picked examples and a system that reliably serves thousands or millions of users.

Barry Zhang currently of Anthropic, an AI engineer who recently spoke at the AI Engineer Summit, put it bluntly: “Don’t rush to build agents for every problem.” His experience echoes throughout the industry—AI agents excel at specific tasks but are far from the universal solution they’re often portrayed to be.

This tension between hype and reality isn’t just academic. Companies are investing billions in AI capabilities while struggling to translate that investment into reliable products. The question isn’t whether AI agents will transform business—they already are—but how to build systems that actually live up to their promise.

What Makes an AI Agent Different from a Simple Workflow?

Before diving into how to build effective agents, it’s worth clarifying what exactly constitutes an “agent” in the first place—a term that’s been stretched to include almost any system with an LLM API call.

According to Anthropic’s definition, which has gained traction in the industry, AI agents are distinct from simpler workflows in a fundamental way: autonomy. While workflows follow predetermined paths with LLM calls at specific points, agents dynamically determine their own actions based on environmental feedback, operating in a loop until they achieve their goal or hit a stopping point.

Think of it this way: a workflow is like following a recipe with exact measurements and steps. An agent is more like a chef who tastes the dish as they cook, adjusting ingredients and techniques based on what they observe.

This distinction matters because it shapes how we build, evaluate, and deploy these systems. Anthropic’s blog post “Building Effective Agents” distinguishes workflows with fixed control flows from agents that decide their own paths. This autonomy is both the source of agents’ power and their complexity.

Dave Abar explains: “For many applications, optimizing single LLM calls with retrieval and in-context examples is usually enough.” Starting with simpler workflows often delivers more reliable results than jumping straight to agentic systems.

The confusion around terminology isn’t merely semantic. Companies marketing simple workflows as “agents” have created inflated expectations about what these systems can do. Understanding the true capabilities and limitations of different approaches is essential for building systems that work.

The Three Core Principles for Building Effective AI Agents

At the AI Engineer Summit, Barry outlined three fundamental principles for building effective AI agents that have become a north star for many in the industry: don’t build agents for everything, keep it simple, and think like your agents.

1. Don’t Build Agents for Everything

Not every problem requires an agent’s complexity. Barry suggests a practical checklist to determine if a use case warrants an agent:

Is the task complex and high-value?
Does it involve ambiguity?
Can the agent’s output be verified?
Do models already handle parts of the task well?

Coding serves as a prime example of an appropriate agent use case. Translating a design document into a pull request is inherently complex and ambiguous. Good code delivers immense value, modern models excel at parts of the coding process, and coding errors can be verified through unit tests and CI pipelines.

This explains why coding agents like GitHub Copilot and Devin have gained traction while many other agent applications struggle to deliver consistent value. The lesson? Be selective about where you deploy agents, focusing on problems where their capabilities align with genuine needs.

For startup founders, this means being honest about whether your problem truly requires an agent or could be solved more reliably with a simpler approach. The urge to add “AI agent” to your pitch deck should be tempered by a clear-eyed assessment of whether the complexity is justified.

2. Keep It Simple

When building agents, Barry advocates for a minimalist approach that maximizes iteration speed. He breaks agents down to their essence: “a model using tools in a loop,” defined by three components:

System prompt: Instructions telling the agent what to do
Tools: Functions the agent can call to interact with the world
Execution loop: The process for running the agent until completion

This framework strips away unnecessary complexity, allowing developers to iterate rapidly. Barry shared that despite building agents for vastly different use cases, they shared nearly identical backbones and code, differing only in their tools and prompts.

“Complexity upfront kills agility,” Barry warned. His team focuses on nailing these core components before adding optimizations. Later refinements might include caching trajectories to cut costs for coding agents or parallelizing tool calls for search agents.

This principle echoes throughout the industry. Andrej Karpathy, former Director of AI at Tesla, promotes a similar philosophy with his concept of “vibe coding”—using English prompts to build software without formal coding skills. This approach democratizes development but requires embracing simplicity over premature optimization.

For engineering teams, this means resisting the urge to build complex agent architectures before proving their value with simpler implementations. The most successful agent projects start small and grow in complexity only when necessary.

3. Think Like Your Agents

Perhaps the most profound insight Barry shared was the importance of adopting your agent’s perspective. He admitted to designing agents from a human perspective, only to be baffled by their mistakes.

To bridge this gap, he urged developers to step into the agent’s context window—the 10,000 to 20,000 tokens of text that represent everything the model knows at a given moment. This mental exercise reveals how limited and disorienting an agent’s perspective can be.

Barry invited the audience to imagine being a computer-use agent: “Armed only with a static screenshot and a poorly written task description, you attempt a click, then wait blindly for three to five seconds as the model processes. When the next screenshot appears, the action might have worked—or crashed the system.”

This “mildly uncomfortable” exercise reveals critical context gaps. For computer-use agents, screen resolution is vital for accurate clicks. Recommended actions and guardrails prevent unnecessary exploration.

Since agents communicate in human language, developers can interrogate them directly. Feeding a system prompt into a model like Claude and asking, “Is this clear? Can you follow it?” reveals ambiguities. Similarly, analyzing an agent’s trajectory and asking, “Why did you make this choice?” can pinpoint where context falls short.

This principle applies beyond technical implementation to the broader design of AI systems. Karpathy describes LLMs as “people spirits”—stochastic simulations of human behavior with superhuman memory but notable cognitive deficits, including hallucinations, uneven intelligence, and limited self-knowledge.

Understanding these limitations is crucial for designing systems that leverage AI’s strengths while accounting for its weaknesses. The most effective agent builders develop an intuitive feel for how their models “think” and anticipate potential failure modes.

The Psychological Profile of AI Agents: Understanding Their Limitations

Building effective agents requires understanding their psychology—the inherent strengths and weaknesses that shape their behavior. Andrej Karpathy offers a compelling framework for thinking about LLMs as “stochastic simulations of people,” trained on vast internet text to mimic human-like behavior.

Their superpowers include encyclopedic knowledge far surpassing any human’s memory. Yet they suffer from significant cognitive deficits:

Hallucinations: LLMs invent plausible but incorrect facts, a limitation that torpedoed Apple’s summarization feature and continues to plague systems like Amazon’s Alexa.
Uneven intelligence: They excel in some domains while failing at tasks humans find trivial, creating a “jagged” intelligence profile that’s difficult to predict.
Limited self-knowledge: Models sometimes insist on errors like “9.11 is greater than 9.9” or miscounting letters in words, revealing a blindness to their own mistakes.
Retrograde amnesia: Unlike human colleagues who build organizational context over time, LLMs rely on finite context windows, requiring explicit programming of their working memory.
Security vulnerabilities: Prompt injection and other attacks can hijack agent behavior in unexpected ways.

Karpathy likens LLMs to the protagonists of “Memento” or “50 First Dates”—characters with profound amnesia who must navigate the world with limited context. This analogy helps developers grasp the fundamental constraints of these systems and design around them.

Understanding these limitations explains why companies like Apple and Amazon have struggled with AI features. It’s not for lack of resources or talent—it’s the inherent challenge of building reliable systems with tools that have fundamental cognitive gaps.

The most successful applications embrace these limitations rather than fighting them. They keep humans in the loop, design for verification, and create what Karpathy calls “partial autonomy apps,” which integrate LLMs to augment human work while maintaining human oversight.

Cursor, an AI-powered code editor, exemplifies this approach. Rather than replacing developers, it offers an “autonomy slider” where users control the AI’s scope, from minor suggestions to larger refactoring. This keeps fallible systems in check while leveraging their strengths.

The Third Wave: Vertical AI Agents as the New Frontier

Industry analysts predict we’re entering the third wave of AI agents—vertical AI agents designed for specific industries and use cases. This mirrors the evolution of traditional software, which progressed from horizontal platforms to vertical SaaS solutions.

Just as the SaaS boom created dozens of billion-dollar companies targeting specific industries, vertical AI agents promise to generate over 300 billion-dollar companies, according to some analysts. The key difference? Traditional SaaS required significant human labor to operate; vertical AI agents eliminate much of this cost, offering enterprises a compelling return on investment.

For developers and startups, this convergence of Software 3.0, agent engineering, and vertical AI presents a once-in-a-generation opportunity. The barriers to entry are lower than ever, with tools like Cursor and natural language programming enabling solopreneurs to build sophisticated, industry-specific solutions without massive teams or budgets.

However, the window to capitalize on this opportunity is narrowing. As industries recognize the value of vertical AI agents, competition will intensify. Startups and developers who act now—mastering agent engineering skills and targeting niche markets—stand to reap enormous rewards.

This shift echoes the transition from mainframe computing to personal computers. Karpathy describes today’s cloud-based AI systems as akin to 1960s mainframes, with users “time-sharing” access to powerful models. But unlike the 1960s, when technology was restricted to governments and corporations, today’s AI tools are instantly available to billions via software.

This inversion of technology diffusion—where consumers adopt AI faster than institutions—creates unprecedented opportunities for innovation. The most successful builders in this new era will be those who understand both the technical foundations and the human needs their agents serve.

Practical Frameworks for Building AI Systems That Work

For developers looking to build reliable AI systems, several practical frameworks have emerged from industry leaders.

Abar outlines several workflow patterns for different levels of complexity:

Single-call workflows: The simplest pattern, using one LLM call with a well-crafted prompt.
Sequential workflows: Multiple LLM calls in sequence, each handling a specific subtask.
Branching workflows: Decision points determine which path to follow based on LLM outputs or external inputs.
Retrieval-augmented generation (RAG): Enhancing prompts with relevant information fetched from external sources.
Agentic systems: LLMs operating in a loop, making decisions based on environmental feedback.

The key is matching the workflow to the problem. Starting simple and adding complexity only when necessary reduces costs and increases reliability.

Barry’s three-component framework—system prompt, tools, and execution loop—offers a complementary approach. This minimalist structure focuses on the essential elements, allowing for rapid iteration and testing.

For those building agents, Abar offers five key tips:

Start with a simple workflow: Choose the simplest pattern that addresses your needs.
Test extensively: Identify edge cases and failure modes through rigorous testing.
Implement guardrails: Use verification steps to catch errors before they reach users.
Collect user feedback: Learn from real-world interactions to improve performance.
Monitor and iterate: Track key metrics and continuously refine your system.

These frameworks aren’t just theoretical—they’re battle-tested approaches from builders who have experienced the challenges of deploying AI systems at scale. By following these patterns, developers can avoid common pitfalls and build more reliable agents.

The Future of Human-AI Collaboration

The most successful AI applications aren’t those that attempt to replace humans entirely but those that create effective human-AI collaborations. Karpathy advocates for an “Iron Man suit” approach—augmentation over full automation.

Like Tony Stark’s suit, which enhances human capabilities while allowing autonomous action, LLM apps should empower users with fast, auditable workflows. Karpathy warns against overzealous agents producing unwieldy outputs, like 10,000-line code diffs, which overwhelm human verifiers. Instead, developers should focus on small, incremental changes, keeping the AI “on a leash” to ensure reliability.

This vision of collaboration shapes how the most effective AI systems are designed. Rather than pursuing full autonomy, they create what Barry calls “partial autonomy apps,” which integrate LLMs to augment human work while keeping humans in the loop.

Perplexity, for example, applies these principles to search and research, orchestrating multiple LLM calls, presenting sources for human auditing, and offering varying levels of autonomy (quick search vs. deep research). Effective LLM apps share key traits: they manage context, orchestrate multiple models, provide graphical interfaces for easy auditing, and include autonomy sliders.

This collaborative approach acknowledges the complementary strengths of humans and AI. While LLMs excel at generating, summarizing, and processing vast information, humans remain superior at judgment, creativity, and critical thinking. The most powerful systems leverage both.

The future Barry envisions isn’t one where agents replace humans but where they become partners, handling routine tasks while elevating human capabilities. This vision aligns with Karpathy’s prediction that the “2020s will be the decade of agents”—a gradual evolution rather than an overnight revolution.

Building for Agents: Redesigning Digital Infrastructure

As AI agents become more sophisticated, our digital infrastructure needs to evolve to accommodate them. Karpathy envisions a future where digital systems are designed for AI agents, a new class of “human-like” consumers and manipulators of information.

Unlike humans using graphical interfaces or computers using APIs, agents need interfaces tailored to their capabilities. Karpathy proposes simple solutions, like an “llm.txt” file on websites, written in markdown to guide LLMs on a domain’s purpose, avoiding error-prone HTML parsing.

Documentation must also evolve. Companies like Vercel and Stripe are already offering LLM-friendly markdown docs, replacing human-centric instructions like “click this” with machine-executable commands like curl requests.

These innovations bridge the gap between human-designed systems and agent-friendly interfaces. While LLMs may soon navigate GUIs autonomously, Karpathy argues for meeting them halfway to reduce errors and costs.

This shift has profound implications for developers and businesses. The most forward-thinking companies are already adapting their digital infrastructure to be “agent-ready,” anticipating a future where AI agents become primary consumers of their services.

For example, GitHub’s “get ingest” feature concatenates repository files into LLM-readable text, while DeepWiki generates repository-specific documentation for LLMs. These tools represent early examples of how digital infrastructure is being redesigned for the age of agents.

Open Questions and Challenges in the Agent Landscape

Despite rapid progress, significant challenges remain in building effective AI agents. Barry highlighted three open questions facing the AI community:

How do we reduce the “thinking” time? This agent-perceived latency creates a disjointed experience that feels burdensome and frustrating to users.
How can we improve the mental model for developers? Today’s debugging tools are primitive, making it difficult to understand why agents make certain decisions.
Can agents represent complex systems clearly enough for humans to audit? As systems grow more complex, ensuring transparency becomes crucial for trust and reliability.

These questions reflect the inherent tensions in agent design. As systems become more autonomous, they also become more opaque. Finding ways to maintain human oversight while increasing agent capabilities remains a central challenge.

Security concerns also loom large. In my conversations with AI engineers, prompt injection attacks—where malicious inputs manipulate agent behavior—emerged as a significant worry. As agents gain more capabilities and access, securing them against exploitation becomes critical.

The economic impact of agent automation raises additional questions. While agents promise increased productivity, they also threaten to displace certain jobs. Finding ways to distribute the benefits of this technology equitably will be crucial for its sustainable adoption.

These challenges demand ongoing research and collaboration across the AI community. The companies and developers who succeed will be those who address these open questions while delivering practical solutions to real-world problems.

Getting Started: Practical Steps for Aspiring Agent Builders

For those looking to enter the world of AI agent development, several practical paths have emerged:

Start with a real problem: Y Combinator’s advice to “build something people want” applies doubly to AI agents. Focus on solving genuine pain points rather than chasing technological novelty.
Master the core components: Understand the fundamental building blocks of agents—system prompts, tools, and execution loops—before diving into more complex architectures.
Experiment with no-code tools: Platforms like n8n democratize access to agent building, allowing those without extensive coding experience to create functional systems.
Leverage open-source frameworks: Tools like LangChain provide infrastructure for agent development, with over 70 million monthly downloads validating their utility.
Focus on vertical applications: Target specific industries and use cases where AI agents can deliver immediate value, rather than attempting to build general-purpose assistants.

These paths aren’t mutually exclusive, and many successful agent builders combine approaches based on their skills and goals. The key is to start building and iterating, learning from both successes and failures.

Resources like HubSpot’s free AI Agents Playbook offer structured guidance for beginners, while communities around tools like LangChain provide forums for sharing best practices and troubleshooting common issues.

For those with coding experience, frameworks like OpenAI’s Agents SDK offer powerful tools for building sophisticated systems. These frameworks handle many of the technical complexities, allowing developers to focus on solving domain-specific problems.

The Road Ahead: Navigating the AI Agent Revolution

As we stand at the dawn of the agent era, the path forward combines both exciting possibilities and sobering challenges. The builders who succeed will be those who understand the fundamental principles of agent design while remaining grounded in real-world applications.

Karpathy’s vision of Software 3.0 offers a compelling framework for understanding this shift. We’re witnessing a transformation where programming is democratized through natural language, enabling anyone with clear ideas to create software. This “vibe coding” approach eliminates the traditional 5-10 year learning curve for software development, opening the field to billions of new creators.

At the same time, the challenges highlighted by Apple and Amazon’s struggles remind us that building reliable AI systems requires more than just technological enthusiasm. It demands rigorous testing, thoughtful design, and a clear understanding of both AI capabilities and limitations.

The vertical AI agent revolution promises to create hundreds of billion-dollar companies targeting specific industries. From healthcare to finance, education to manufacturing, every sector stands to be transformed by specialized agents that automate complex processes while delivering measurable value.

For developers, the message is clear: master the skills of agent engineering, understand the psychology of your agents, and focus on solving real problems. For businesses, the imperative is to identify areas where agents can deliver value while maintaining necessary human oversight.

The companies that thrive will be those that embrace what Barry and Karpathy advocate—starting simple, iterating rapidly, and building systems that augment human capabilities rather than attempting to replace them entirely. The “Iron Man suit” approach, enhancing human potential while maintaining human judgment, offers the most promising path forward.

As the AI landscape continues to evolve, one certainty remains: we are only at the beginning of the agent revolution. The systems being built today represent the 1960s of this new computing paradigm—powerful but primitive compared to what will come. For those willing to master the fundamentals and solve genuine problems, the opportunities ahead are limitless.

Barry’s closing words at the AI Engineer Summit capture this moment perfectly: “I can’t wait to build it with all of you.” That collaborative spirit—humans and AI working together to create systems that enhance human capability—will define the successful applications of the agent era.

This post contains affiliate links. If you purchase through these links, I may earn a commission at no extra cost to you.

2 responses to “Inside the AI Agent Boom: Promise, Hype, and Reality”

Inside the Mind of AI: Truths Machines Won’t Tell – Thoughts on Technology

July 12, 2025

[…] explainability for performance, but now find ourselves confronting an uncomfortable truth: the thinking processes we’ve attributed to these systems may be fundamentally misaligned with […]

Loading…

Ukraine-Russia War: The Untold Story of NATO’s Role – Thoughts on Technology

July 14, 2025

[…] highlights this contradiction between Western rhetoric about sovereignty and the reality of geopolitical thinking. He argues that understanding Russia's security concerns about NATO expansion—without […]

Loading…

Inside the AI Agent Boom: Promise, Hype, and Reality