Build a multimodal AI agent with MongoDB and Google Gemini, processing mixed-media content using Python.
Overview
Dive into building multimodal AI agents from scratch in this hands-on workshop. Apoorva Joshi from MongoDB guides you through creating agents capable of processing diverse mixed-media content—from analyzing charts and diagrams to extracting insights from documents with embedded visuals. You will learn to leverage MongoDB as both a vector database and a memory store, combined with Google's Gemini for advanced multimodal reasoning. This session provides practical experience with multimodal data processing pipelines and agent orchestration patterns, all implemented directly using Python. It's ideal for developers and AI engineers looking to integrate complex AI capabilities into their applications.