Sayash Kapoor critically examines the current state of AI agents, their shortcomings, and best practices for robust evaluation.
Overview
Sayash Kapoor, co-author of AI Snake Oil, delivers a critical analysis of the current landscape of AI agents, questioning if 2025 truly marks their breakthrough. From software engineering to web automation, the talk dissects why existing agents like Rabbit R1 and Humane Pin fall short of their promised performance. It highlights common pitfalls and offers invaluable insights into improving agent evaluation, guiding AI engineers and researchers on how to build AI agents that deliver real-world utility. Recorded live at the Agent Engineering Session Day from the AI Engineer Summit 2025.