Skill36.7k estrellas del repoactualizado today

brightdata-web-mcp

Bright Data Web MCP provides web search, scraping, and data extraction capabilities for Claude agents through Bright Data's infrastructure. Use this skill to fetch live web content, bypass anti-bot measures and CAPTCHAs, extract structured data from URLs, or automate browser interactions when standard requests fail or content is dynamically generated.

Ver fuente Repositorio: ai-engineering-hub

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/patchy631/ai-engineering-hub /tmp/brightdata-web-mcp && cp -r /tmp/brightdata-web-mcp/hugging-face-skills/skills/brightdata-web-mcp ~/.claude/skills/brightdata-web-mcp

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Bright Data Web MCP

Use this skill for **reliable web access** in MCP-compatible agents. Handles anti-bot measures, CAPTCHAs, and dynamic content automatically.

## Quick Start

### Search the web

```
Tool: search_engine
Input: { "query": "latest AI news", "engine": "google" }
```

Returns JSON for Google, Markdown for Bing/Yandex. Use `cursor` parameter for pagination.

### Scrape a page to Markdown

```
Tool: scrape_as_markdown
Input: { "url": "https://example.com/article" }
```

### Extract structured data (Pro/advanced_scraping)

```
Tool: extract
Input: { 
  "url": "https://example.com/product",
  "prompt": "Extract: name, price, description, availability"
}
```

## When to Use

| Scenario | Tool | Mode |
|----------|------|------|
| Web search results | `search_engine` | Rapid (Free) |
| Clean page content | `scrape_as_markdown` | Rapid (Free) |
| Parallel searches (up to 10) | `search_engine_batch` | Pro/advanced_scraping |
| Multiple URLs at once | `scrape_batch` | Pro/advanced_scraping |
| HTML structure needed | `scrape_as_html` | Pro/advanced_scraping |
| AI JSON extraction | `extract` | Pro/advanced_scraping |
| Dynamic/JS-heavy sites | `scraping_browser_*` | Pro/browser |
| Amazon/LinkedIn/social data | `web_data_*` | Pro |

## Setup

**Remote (recommended) - No installation required:**

SSE Endpoint:
```
https://mcp.brightdata.com/sse?token=YOUR_API_TOKEN
```

Streamable HTTP Endpoint:
```
https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN
```

**Local:**
```bash
API_TOKEN=<token> npx @brightdata/mcp
```

## Modes & Configuration

### Rapid Mode (Free - Default)
- **5,000 requests/month free**
- Tools: `search_engine`, `scrape_as_markdown`

### Pro Mode
- All Rapid tools + 60+ advanced tools
- Remote: add `&pro=1` to URL
- Local: set `PRO_MODE=true`

### Tool Groups
Select specific tool bundles instead of all Pro tools:
- Remote: `&groups=ecommerce,social`
- Local: `GROUPS=ecommerce,social`

| Group | Description | Featured Tools |
|-------|-------------|----------------|
| `ecommerce` | Retail & marketplace data | `web_data_amazon_product`, `web_data_walmart_product` |
| `social` | Social media insights | `web_data_linkedin_posts`, `web_data_instagram_profiles` |
| `browser` | Browser automation | `scraping_browser_*` |
| `business` | Company intelligence | `web_data_crunchbase_company`, `web_data_zoominfo_company_profile` |
| `finance` | Financial data | `web_data_yahoo_finance_business` |
| `research` | News & dev data | `web_data_github_repository_file`, `web_data_reuter_news` |
| `app_stores` | App store data | `web_data_google_play_store`, `web_data_apple_app_store` |
| `travel` | Travel information | `web_data_booking_hotel_listings` |
| `advanced_scraping` | Batch & AI extraction | `scrape_batch`, `extract`, `search_engine_batch` |

### Custom Tools
Cherry-pick individual tools:
- Remote: `&tools=scrape_as_markdown,web_data_linkedin_person_profile`
- Local: `TOOLS=scrape_as_markdown,web_data_linkedin_person_profile`

> Note: `GROUPS` or `TOOLS` override `PRO_MODE` when specified.

## Core Tools Reference

### Search & Scraping (Rapid Mode)
- `search_engine` - Google/Bing/Yandex SERP results (JSON for Google, Markdown for others)
- `scrape_as_markdown` - Clean Markdown from any URL with anti-bot bypass

### Advanced Scraping (Pro/advanced_scraping)
- `search_engine_batch` - Up to 10 parallel searches
- `scrape_batch` - Up to 10 URLs in one request
- `scrape_as_html` - Full HTML response
- `extract` - AI-powered JSON extraction with custom prompt
- `session_stats` - Monitor tool usage during session

### Browser Automation (Pro/browser)
For JavaScript-rendered content or user interactions:

| Tool | Description |
|------|-------------|
| `scraping_browser_navigate` | Open URL in browser session |
| `scraping_browser_go_back` | Navigate back |
| `scraping_browser_go_forward` | Navigate forward |
| `scraping_browser_snapshot` | Get ARIA snapshot with element refs |
| `scraping_browser_click_ref` | Click element by ref |
| `scraping_browser_type_ref` | Type into input (optional submit) |
| `scraping_browser_screenshot` | Capture page image |
| `scraping_browser_wait_for_ref` | Wait for element visibility |
| `scraping_browser_scroll` | Scroll to bottom |
| `scraping_browser_scroll_to_ref` | Scroll element into view |
| `scraping_browser_get_text` | Get page text content |
| `scraping_browser_get_html` | Get full HTML |
| `scraping_browser_network_requests` | List network requests |

### Structured Data (Pro)
Pre-built extractors for popular platforms:

**E-commerce:**
- `web_data_amazon_product`, `web_data_amazon_product_reviews`, `web_data_amazon_product_search`
- `web_data_walmart_product`, `web_data_walmart_seller`
- `web_data_ebay_product`, `web_data_google_shopping`
- `web_data_homedepot_products`, `web_data_bestbuy_products`, `web_data_etsy_products`, `web_data_zara_products`

**Social Media:**
- `web_data_linkedin_person_profile`, `web_data_linkedin_company_profile`, `web_data_linkedin_job_listings`, `web_data_linkedin_posts`, `web_data_linkedin_people_search`
- `web_data_instagram_profiles`, `web_data_instagram_posts`, `web_data_instagram_reels`, `web_data_instagram_comments`
- `web_data_facebook_posts`, `web_data_facebook_marketplace_listings`, `web_data_facebook_company_reviews`, `web_data_facebook_events`
- `web_data_tiktok_profiles`, `web_data_tiktok_posts`, `web_data_tiktok_shop`, `web_data_tiktok_comments`
- `web_data_x_posts`
- `web_data_youtube_videos`, `web_data_youtube_profiles`, `web_data_youtube_comments`
- `web_data_reddit_posts`

**Business & Finance:**
- `web_data_google_maps_reviews`, `web_data_crunchbase_company`, `web_data_zoominfo_company_profile`
- `web_data_zillow_properties_listing`, `web_data_yahoo_finance_business`

**Other:**
- `web_data_github_repository_file`, `web_data_reuter_n

Del mismo repositorio

grpo-finetuneSkill

hugging-face-cliSkill

Execute Hugging Face Hub operations using the `hf` CLI. Use when the user needs to download models/datasets/spaces, upload files to Hub repositories, create repos, manage local cache, or run compute jobs on HF infrastructure. Covers authentication, file transfers, repository creation, cache operations, and cloud compute.

hugging-face-datasetsSkill

Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.

hugging-face-evaluationSkill

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

hugging-face-jobsSkill

This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.

hugging-face-model-trainerSkill

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

hugging-face-paper-publisherSkill

Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.

hugging-face-tool-builderSkill

Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.