Skip to main content
ClaudeWave
Skill460 estrellas del repoactualizado 6d ago

search-tips

This skill documents strategic guidance for conducting web research using Exa search, Firecrawl CLI, and Reddit MCP tools. Use it when planning research workflows that require discovering sources, fetching content from identified URLs, or extracting Reddit discussions, as it provides starting patterns for when to apply each tool and how to configure them for efficiency.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/malob/nix-config /tmp/search-tips && cp -r /tmp/search-tips/configs/claude/skills/search-tips ~/.claude/skills/search-tips
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Search Tips

Accumulated guidance for web research using Exa, Firecrawl CLI, and Reddit MCP tools. These
are **starting points, not rigid rules** -- think strategically about each situation and
adapt. If a different approach makes more sense for what you're trying to do, go with it.
Run `npx firecrawl-cli <command> --help` to check available options beyond what's documented
here. Reference files cover tool-specific deep dives -- see bottom of this file.

## Setup

Before starting research, load the required MCP tools using ToolSearch:

1. **Exa tools** -- `web_search_advanced_exa` and `get_code_context_exa`
2. **Reddit tools** -- `get_top_posts`, `get_post_comments`, `get_reddit_post`, `get_subreddit_info`

Firecrawl CLI (`npx firecrawl-cli`) runs via Bash -- no MCP setup needed. Load only what
the task requires.

## The Research Cycle

Prefer Exa and Firecrawl over built-in WebSearch/WebFetch.

Research alternates between **searching** (discovering sources) and **fetching** (extracting
content from them). Find promising leads, read the best ones, refine your understanding,
search again.

### Searching

Finding sources you don't have yet.

- **Exa search** (`web_search_advanced_exa`) -- primary tool for web discovery. Natural
  language queries, add filters as needed (domains, dates, categories).
- **Exa code context** (`get_code_context_exa`) -- programming topics. Worth trying before
  general Exa search for technical/code tasks -- surfaces repos, packages, and docs.
- **Firecrawl CLI search** (`npx firecrawl-cli search`) -- Google-powered keyword search.
  Useful when keyword matching works better than Exa's semantic approach, for site-scoped
  queries (`site:reddit.com {query}`), and for content-type filtering (`--categories research`
  for academic, `--sources news`, `--tbs qdr:w` for time).

**Default Exa search pattern:** Default to `enableHighlights: true` and `textMaxCharacters: 1`.
This returns quoted passages from actual page text while preventing the MCP server from
flooding context with full text. Use `highlightsPerUrl` and `highlightsNumSentences` to
control volume if needed.

### Fetching

Extracting content from a source you've identified.

- **Firecrawl CLI scrape** (`npx firecrawl-cli scrape "<url>" --only-main-content`) -- primary
  tool for reading a known URL. The flag strips nav/sidebars to save tokens.
- **Reddit MCP** (`get_post_comments`, `get_reddit_post`) -- for reading Reddit threads.
  Firecrawl can't scrape reddit.com directly.
- **Firecrawl CLI map** (`npx firecrawl-cli map "<url>" --search "query"`) -- discover URLs
  on a site (useful when you need to find the right page, or when scrape returns empty).

### Adapting the Workflow

The defaults above won't always be right. Some common deviations:

- **Exa full text as a scraping fallback** -- some sites are blocked or inaccessible via
  Firecrawl (LinkedIn, Twitter/X, etc.), but Exa often has the full page text in its index.
  Drop both `enableHighlights` and `textMaxCharacters: 1` to get the complete text. Be aware
  this can produce large responses.
- **`--only-main-content` can strip too much** -- if you got empty or partial results, retry
  without the flag. Known to fail on Future plc sites, Blogspot, and GDPR-heavy sites. See
  Content Extraction below.

The reference files cover more edge cases -- scraping issues, category restrictions, and
academic search.

## Search Strategy

### How Exa Works

Exa is a **neural/semantic search engine**. It uses embeddings to understand meaning.

- **Natural questions or statements work best** -- Exa finds pages that answer them
- **Longer, more specific queries work BETTER** -- unlike keyword-based search
- **Keyword lists tend to confuse** the semantic model

Good: "What do professional reviewers say are the most reliable dishwasher brands in 2025?"
Bad: "best dishwasher 2025 reliable"

### Query Reformulation

For broad topics, Exa's `additionalQueries` parameter can automate this -- it bundles query
variations in a single call at no extra cost (see `references/exa-tips.md`). For manual
reformulation, try generating 3-5 query variations:

| Technique             | What It Does                          | Example                                      |
| --------------------- | ------------------------------------- | -------------------------------------------- |
| **Paraphrase**        | Same meaning, different words         | "RAG failures" -> "problems in RAG systems"  |
| **Decompose**         | Break into sub-questions              | "Why fail?" -> "Why return irrelevant docs?" |
| **Scope shift**       | Broader context or narrower specifics | "Challenges in production AI search"         |
| **Perspective shift** | Different viewpoints                  | User vs expert vs critic view                |
| **Temporal framing**  | Target different time periods         | "Recent 2024-2025" vs "foundational"         |

### Domain Filtering

Try `includeDomains` when the authoritative site for a topic is known -- faster and less noisy
than broad search. Try `excludeDomains` to suppress sites that keep appearing but aren't
useful (e.g., exclude `youtube.com` when video pages crowd out needed editorial content about 
YouTube creators/content).

### Searching by Content Type

Match your research target to the right approach. Reference files have full strategies.

- **Social sentiment / opinions** -- Exa `tweet` for Twitter; `site:reddit.com` via Firecrawl
  search + Reddit MCP for discussions. See `references/twitter.md`, `references/reddit.md`.
- **People / companies** -- Exa `people` or `company` category for discovery, then broaden.
  See `references/people-companies.md`.
- **Academic papers** -- Exa `research paper` category; academic APIs for structured data.
  See `references/academic-search.md`.
- **Code / GitHub** -- `get_code_context_exa` for code; `gh api` for repo data.
  See `references/code-github.md`.
- **News** -- Exa `news` category with date filters;