Paying Tribute to Anthropic, Gemini: iMean AI Ushers in the Post-training Era for WebAgents
In the wave of large language models (LLMs) and AI agents, every technological advancement represents a further step toward achieving AGI (Artificial General Intelligence). One month ago, Anthropic and Google launched their pre-trained foundational models for GUI Agents. OpenAI is expected to release similar pre-trained models capable of controlling computers in January 2025. Taking Google as an example, during the recently concluded NeurIPS conference, Google releases its next-generation foundational model Gemini 2 to introduce a WebAgent called Project Mariner. Unlike LLMs that can only engage in language-based conversations, one of the major updates in Gemini 2 is its ability to recognize visual content and code within browsers, plan tasks, and interact with browsers like a human. This makes it a typical WebAgent foundational model.