JavaScript in-page GUI agent. Control web interfaces with natural language.
Page Agent is a TypeScript library from Alibaba that embeds a GUI automation agent directly inside a webpage using plain JavaScript, requiring no browser extension, headless browser, or Python runtime. It works by parsing the DOM as text rather than taking screenshots, which means it can operate with standard text-based LLMs rather than multimodal models. Developers integrate it via a single script tag or the `page-agent` npm package, then call `agent.execute()` with natural language instructions like "Click the login button." It supports bring-your-own-LLM configuration, including models such as Qwen via Alibaba's DashScope API. An optional Chrome extension extends its reach across multiple browser tabs for multi-page workflows, and a beta MCP server allows external agent clients, including Claude Desktop, to drive the browser through the MCP protocol. Primary beneficiaries include SaaS developers building in-product AI copilots, teams automating complex form workflows in ERP or CRM systems, and accessibility tooling authors who want natural language control over any web interface.
- ✓Open-source license (MIT)
- ✓Actively maintained (<30d)
- ✓Healthy fork ratio
- ✓Clear description
- ✓Topics declared
- ✓Documented (README)
git clone https://github.com/alibaba/page-agent{
"mcpServers": {
"page-agent": {
"command": "node",
"args": ["/path/to/page-agent/dist/index.js"]
}
}
}Resumen de MCP Servers
# Page Agent
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://img.alicdn.com/imgextra/i4/O1CN01qKig1P1FnhpFKNdi6_!!6000000000532-2-tps-1280-256.png">
<img alt="Page Agent Banner" src="https://img.alicdn.com/imgextra/i1/O1CN01NCMKXj1Gn4tkFTsxf_!!6000000000666-2-tps-1280-256.png">
</picture>
[](https://opensource.org/licenses/MIT) [](http://www.typescriptlang.org/) [](https://bundlephobia.com/package/page-agent) [](https://www.npmjs.com/package/page-agent) [](https://github.com/alibaba/page-agent)
The GUI Agent Living in Your Webpage. Control web interfaces with natural language.
🌐 **English** | [中文](./docs/README-zh.md)
<a href="https://alibaba.github.io/page-agent/" target="_blank"><b>🚀 Demo</b></a> | <a href="https://alibaba.github.io/page-agent/docs/introduction/overview" target="_blank"><b>📖 Docs</b></a> | <a href="https://news.ycombinator.com/item?id=47264138" target="_blank"><b>📢 HN Discussion</b></a> | <a href="https://x.com/simonluvramen" target="_blank"><b>𝕏 Follow on X</b></a>
<!-- demo video -->
https://github.com/user-attachments/assets/a1f2eae2-13fb-4aae-98cf-a3fc1620a6c2
---
## ✨ Features
- **🎯 Easy integration**
- No need for `browser extension` / `python` / `headless browser`.
- Just in-page javascript. Everything happens in your web page.
- **📖 Text-based DOM manipulation**
- No screenshots. No multi-modal LLMs or special permissions needed.
- **🧠 Bring your own LLMs**
- **🐙 Optional [chrome extension](https://alibaba.github.io/page-agent/docs/features/chrome-extension) for multi-page tasks.**
- And an [MCP Server (Beta)](https://alibaba.github.io/page-agent/docs/features/mcp-server) to control it from outside
## 💡 Use Cases
- **SaaS AI Copilot** — Ship an AI copilot in your product in lines of code. No backend rewrite.
- **Smart Form Filling** — Turn 20-click workflows into one sentence. Perfect for ERP, CRM, and admin systems.
- **Accessibility** — Make any web app accessible through natural language. Voice commands, screen readers, zero barrier.
- **Multi-page Agent** — Extend your own web agent's reach across browser tabs [chrome extension](https://alibaba.github.io/page-agent/docs/features/chrome-extension).
- **MCP** - Allow your agent clients to control your browser.
## 🚀 Quick Start
### One-line integration
Fastest way to try PageAgent with our free Demo LLM:
```html
<script src="{URL}" crossorigin="true"></script>
```
> **⚠️ For technical evaluation only.** This demo CDN uses our free [testing LLM API](https://alibaba.github.io/page-agent/docs/features/models#free-testing-api). By using it, you agree to its [terms](https://github.com/alibaba/page-agent/blob/main/docs/terms-and-privacy.md).
| Mirrors | URL |
| ------- | ---------------------------------------------------------------------------------- |
| Global | https://cdn.jsdelivr.net/npm/page-agent@1.9.0/dist/iife/page-agent.demo.js |
| China | https://registry.npmmirror.com/page-agent/1.9.0/files/dist/iife/page-agent.demo.js |
Add `?autoInit=false` to load the script without creating the demo agent automatically. You can then instantiate it with `new window.PageAgent(...)`.
### NPM Installation
```bash
npm install page-agent
```
```javascript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: 'YOUR_API_KEY',
language: 'en-US',
})
await agent.execute('Click the login button')
```
For more programmatic usage, see [📖 Documentations](https://alibaba.github.io/page-agent/docs/introduction/overview).
## 🌟 Awesome Page Agent
Built something cool with PageAgent? Add it here! Open a PR to share your project.
> These are community projects — not maintained or endorsed by us. Use at your own discretion.
| Project | Description |
| -------- | ----------------------------------------------------------- |
| _Yours?_ | [Open a PR](https://github.com/alibaba/page-agent/pulls) 🙌 |
## 🤝 Contributing
We welcome contributions from the community! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines and [docs/developer-guide.md](docs/developer-guide.md) for local development workflows.
Please read the [maintainer's note](https://github.com/alibaba/page-agent/issues/349) on principles and current state.
Contributions generated entirely by **bots or AI** without substantial human involvement will **not be accepted**.
## ⚖️ License
[MIT License](LICENSE)
## 👏 Acknowledgments
This project builds upon the excellent work of **[`browser-use`](https://github.com/browser-use/browser-use)**.
`PageAgent` is designed for **client-side web enhancement**, not server-side automation.
```
DOM processing components and prompt are derived from browser-use:
Browser Use <https://github.com/browser-use/browser-use>
Copyright (c) 2024 Gregor Zunic
Licensed under the MIT License
We gratefully acknowledge the browser-use project and its contributors for their
excellent work on web automation and DOM interaction patterns that helped make
this project possible.
```
---
**⭐ Star this repo if you find PageAgent helpful!**
Lo que la gente pregunta sobre page-agent
¿Qué es alibaba/page-agent?
+
alibaba/page-agent es mcp servers para el ecosistema de Claude AI. JavaScript in-page GUI agent. Control web interfaces with natural language. Tiene 18.5k estrellas en GitHub y se actualizó por última vez today.
¿Cómo se instala page-agent?
+
Puedes instalar page-agent clonando el repositorio (https://github.com/alibaba/page-agent) o siguiendo las instrucciones del README en GitHub. ClaudeWave también te ofrece bloques de instalación rápida en esta misma página.
¿Es seguro usar alibaba/page-agent?
+
Nuestro agente de seguridad ha analizado alibaba/page-agent y le ha asignado un Trust Score de 100/100 (tier: Verified). Revisa el desglose completo de comprobaciones superadas y flags en esta página.
¿Quién mantiene alibaba/page-agent?
+
alibaba/page-agent es mantenido por alibaba. La última actividad registrada en GitHub es de today, con 41 issues abiertos.
¿Hay alternativas a page-agent?
+
Sí. En ClaudeWave puedes explorar mcp servers similares en /categories/mcp, ordenados por popularidad o actividad reciente.
Despliega page-agent en tu cloud
Lleva este repo a producción en minutos. Cada plataforma genera su propio entorno con variables de entorno editables.
¿Mantienes este repo? Añade un badge a tu README
Pega el badge en tu README de GitHub para mostrar que está auditado por ClaudeWave. Cada badge enlaza de vuelta a esta página y muestra el Trust Score actual.
[](https://claudewave.com/repo/alibaba-page-agent)<a href="https://claudewave.com/repo/alibaba-page-agent"><img src="https://claudewave.com/api/badge/alibaba-page-agent" alt="Featured on ClaudeWave: alibaba/page-agent" width="320" height="64" /></a>Más MCP Servers
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
An open-source AI agent that brings the power of Gemini directly into your terminal.
The fastest path to AI-powered full stack observability, even for lean teams.
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。