AgentCPM-GUI
AgentCPM-GUI is an innovative open-source on-device LLM agent model developed by THUNLP and ModelBest, built on MiniCPM-V with 8 billion parameters. It processes smartphone screenshots to autonomously execute user-specified tasks on Android apps, supporting both Chinese and English interfaces.
Key Features
- High-Quality GUI Grounding: Pre-trained on a large bilingual Android dataset for precise localization and understanding of GUI elements like buttons and icons.
- Chinese-App Operation: Fine-tuned for over 30 popular Chinese apps such as Amap and bilibili, making it the first open-source GUI agent for this market.
- Enhanced Planning & Reasoning: Utilizes Reinforcement Fine-Tuning (RFT) to 'think' before acting, significantly improving success rates on complex tasks.
- Compact Action-Space Design: Optimized action space with concise JSON format, reducing action length to 9.7 tokens for efficient on-device inference.
Use Cases
- Task Automation: Automates interactions with Android apps, ideal for users seeking to streamline repetitive tasks.
- Accessibility: Assists users with disabilities by navigating and operating app interfaces autonomously.
- Developer Testing: Supports developers in testing app usability and GUI interactions through automated actions.
AgentCPM-GUI targets developers, tech enthusiasts, and businesses looking to integrate AI-driven automation into Android environments, offering a unique blend of bilingual support and reasoning capabilities.