chrome-devtools
The chrome-devtools skill enables browser automation through executable Puppeteer scripts that perform actions like navigating URLs, capturing screenshots, clicking elements, filling forms, and executing JavaScript, outputting results as JSON. Use this skill to automate web interactions, analyze performance metrics including Core Web Vitals, monitor network traffic and console errors, scrape web content, and debug JavaScript execution.
git clone --depth 1 https://github.com/mrgoonie/claudekit-skills /tmp/chrome-devtools && cp -r /tmp/chrome-devtools/.claude/skills/chrome-devtools ~/.claude/skills/chrome-devtoolsSKILL.md
# Chrome DevTools Agent Skill
Browser automation via executable Puppeteer scripts. All scripts output JSON for easy parsing.
## Quick Start
**CRITICAL**: Always check `pwd` before running scripts.
### Installation
#### Step 1: Install System Dependencies (Linux/WSL only)
On Linux/WSL, Chrome requires system libraries. Install them first:
```bash
pwd # Should show current working directory
cd .claude/skills/chrome-devtools/scripts
./install-deps.sh # Auto-detects OS and installs required libs
```
Supports: Ubuntu, Debian, Fedora, RHEL, CentOS, Arch, Manjaro
**macOS/Windows**: Skip this step (dependencies bundled with Chrome)
#### Step 2: Install Node Dependencies
```bash
npm install # Installs puppeteer, debug, yargs
```
#### Step 3: Install ImageMagick (Optional, Recommended)
ImageMagick enables automatic screenshot compression to keep files under 5MB:
**macOS:**
```bash
brew install imagemagick
```
**Ubuntu/Debian/WSL:**
```bash
sudo apt-get install imagemagick
```
**Verify:**
```bash
magick -version # or: convert -version
```
Without ImageMagick, screenshots >5MB will not be compressed (may fail to load in Gemini/Claude).
### Test
```bash
node navigate.js --url https://example.com
# Output: {"success": true, "url": "https://example.com", "title": "Example Domain"}
```
## Available Scripts
All scripts are in `.claude/skills/chrome-devtools/scripts/`
**CRITICAL**: Always check `pwd` before running scripts.
### Script Usage
- `./scripts/README.md`
### Core Automation
- `navigate.js` - Navigate to URLs
- `screenshot.js` - Capture screenshots (full page or element)
- `click.js` - Click elements
- `fill.js` - Fill form fields
- `evaluate.js` - Execute JavaScript in page context
### Analysis & Monitoring
- `snapshot.js` - Extract interactive elements with metadata
- `console.js` - Monitor console messages/errors
- `network.js` - Track HTTP requests/responses
- `performance.js` - Measure Core Web Vitals + record traces
## Usage Patterns
### Single Command
```bash
pwd # Should show current working directory
cd .claude/skills/chrome-devtools/scripts
node screenshot.js --url https://example.com --output ./docs/screenshots/page.png
```
**Important**: Always save screenshots to `./docs/screenshots` directory.
### Automatic Image Compression
Screenshots are **automatically compressed** if they exceed 5MB to ensure compatibility with Gemini API and Claude Code (which have 5MB limits). This uses ImageMagick internally:
```bash
# Default: auto-compress if >5MB
node screenshot.js --url https://example.com --output page.png
# Custom size threshold (e.g., 3MB)
node screenshot.js --url https://example.com --output page.png --max-size 3
# Disable compression
node screenshot.js --url https://example.com --output page.png --no-compress
```
**Compression behavior:**
- PNG: Resizes to 90% + quality 85 (or 75% + quality 70 if still too large)
- JPEG: Quality 80 + progressive encoding (or quality 60 if still too large)
- Other formats: Converted to JPEG with compression
- Requires ImageMagick installed (see imagemagick skill)
**Output includes compression info:**
```json
{
"success": true,
"output": "/path/to/page.png",
"compressed": true,
"originalSize": 8388608,
"size": 3145728,
"compressionRatio": "62.50%",
"url": "https://example.com"
}
```
### Chain Commands (reuse browser)
```bash
# Keep browser open with --close false
node navigate.js --url https://example.com/login --close false
node fill.js --selector "#email" --value "user@example.com" --close false
node fill.js --selector "#password" --value "secret" --close false
node click.js --selector "button[type=submit]"
```
### Parse JSON Output
```bash
# Extract specific fields with jq
node performance.js --url https://example.com | jq '.vitals.LCP'
# Save to file
node network.js --url https://example.com --output /tmp/requests.json
```
## Execution Protocol
### Working Directory Verification
BEFORE executing any script:
1. Check current working directory with `pwd`
2. Verify in `.claude/skills/chrome-devtools/scripts/` directory
3. If wrong directory, `cd` to correct location
4. Use absolute paths for all output files
Example:
```bash
pwd # Should show: .../chrome-devtools/scripts
# If wrong:
cd .claude/skills/chrome-devtools/scripts
```
### Output Validation
AFTER screenshot/capture operations:
1. Verify file created with `ls -lh <output-path>`
2. Read screenshot using Read tool to confirm content
3. Check JSON output for success:true
4. Report file size and compression status
Example:
```bash
node screenshot.js --url https://example.com --output ./docs/screenshots/page.png
ls -lh ./docs/screenshots/page.png # Verify file exists
# Then use Read tool to visually inspect
```
5. Restart working directory to the project root.
### Error Recovery
If script fails:
1. Check error message for selector issues
2. Use snapshot.js to discover correct selectors
3. Try XPath selector if CSS selector fails
4. Verify element is visible and interactive
Example:
```bash
# CSS selector fails
node click.js --url https://example.com --selector ".btn-submit"
# Error: waiting for selector ".btn-submit" failed
# Discover correct selector
node snapshot.js --url https://example.com | jq '.elements[] | select(.tagName=="BUTTON")'
# Try XPath
node click.js --url https://example.com --selector "//button[contains(text(),'Submit')]"
```
### Common Mistakes
❌ Wrong working directory → output files go to wrong location
❌ Skipping output validation → silent failures
❌ Using complex CSS selectors without testing → selector errors
❌ Not checking element visibility → timeout errors
✅ Always verify `pwd` before running scripts
✅ Always validate output after screenshots
✅ Use snapshot.js to discover selectors
✅ Test selectors with simple commands first
## Common Workflows
### Web Scraping
```bash
node evaluate.js --url https://example.com --script "
Array.from(document.querySelectorAll('.item')).map(el => ({
title: el.Manage MCP (Model Context Protocol) server integrations - discover tools/prompts/resources, analyze relevance for tasks, and execute MCP capabilities. Use when need to work with MCP servers, discover available MCP tools, filter MCP capabilities for specific tasks, execute MCP tools programmatically, or implement MCP client functionality. Keeps main context clean by handling MCP discovery in subagent context.
Stage all files and create a commit.
Stage, commit and push all code in the current branch
Create a pull request
Create a new agent skill
Utilize tools of Model Context Protocol (MCP) servers
Create aesthetically beautiful interfaces following proven design principles. Use when building UI/UX, analyzing designs from inspiration sites, generating design images with ai-multimodal, implementing visual hierarchy and color theory, adding micro-interactions, or creating design documentation. Includes workflows for capturing and analyzing inspiration screenshots with chrome-devtools and ai-multimodal, iterative design image generation until aesthetic standards are met, and comprehensive design system guidance covering BEAUTIFUL (aesthetic principles), RIGHT (functionality/accessibility), SATISFYING (micro-interactions), and PEAK (storytelling) stages. Integrates with chrome-devtools, ai-multimodal, media-processing, ui-styling, and web-frameworks skills.
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.