Skip to main content
ClaudeWave
Skill125 repo starsupdated today

searching-codebases

>-

Install in Claude Code
Copy
git clone --depth 1 https://github.com/oaustegard/claude-skills /tmp/searching-codebases && cp -r /tmp/searching-codebases/searching-codebases ~/.claude/skills/searching-codebases
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Searching Codebases

Find code in any codebase by pattern or concept. One entry point, two
search strategies, automatic routing.

## Prerequisites

```bash
uv tool install ripgrep
```

tree-sitting (for structural context expansion) installs automatically when
the `--expand` flag is used.

## Primary Command

```bash
SKILL_DIR=/mnt/skills/user/searching-codebases

python3 $SKILL_DIR/scripts/search.py SOURCE "query1" ["query2" ...] [OPTIONS]
```

SOURCE is any of:
- Local directory path
- GitHub URL (downloads tarball automatically)
- `uploads` (uses `/mnt/user-data/uploads/`)
- `project` (uses `/mnt/project/`)
- Path to a `.zip` or `.tar.gz` archive

## Search Modes

**Regex mode** (patterns, identifiers, literal text):
```bash
python3 $SKILL_DIR/scripts/search.py ./repo "def handle_error"
python3 $SKILL_DIR/scripts/search.py ./repo "class.*Exception" --regex
python3 $SKILL_DIR/scripts/search.py ./repo "TODO|FIXME|HACK"
```

**Semantic mode** (concepts, natural language):
```bash
python3 $SKILL_DIR/scripts/search.py ./repo "retry logic with backoff" --semantic
python3 $SKILL_DIR/scripts/search.py ./repo "authentication flow"
python3 $SKILL_DIR/scripts/search.py ./repo "error handling strategy"
```

Auto-detection: short queries and code-like tokens → regex. Multi-word
natural language → semantic. Override with `--regex` or `--semantic`.

## Options

- `--regex` / `--semantic`: Force search mode
- `--expand`: Return full function bodies via tree-sitting AST context
- `--benchmark`: Compare indexed regex vs brute-force ripgrep
- `--branch NAME`: Git branch for GitHub URLs (default: main)
- `--skip DIRS`: Comma-separated directories to skip
- `--json`: Machine-readable output
- `-v`: Show index stats and query routing decisions

## How It Works

**Regex search** builds a sparse n-gram inverted index over all files.
Queries are decomposed into literal fragments, looked up in the index
to identify candidate files (typically 90-99% reduction), then verified
with ripgrep. Frequency-weighted n-grams make rare character sequences
more selective.

**Semantic search** builds a TF-IDF index over code chunks (functions,
classes, structural entries). Queries are ranked by cosine similarity.

**Context expansion** (`--expand`) uses tree-sitting's AST cache to
identify function/class boundaries, returning complete structural units
rather than line fragments. On first use, tree-sitting scans the repo
(~700ms for 250 files); subsequent expansions are sub-millisecond.

**Small codebases** (< 20 files) skip indexing entirely — direct ripgrep is
faster when there's nothing to narrow.

## Mixed Queries

Multiple queries can use different modes in a single invocation. Each query
is auto-routed independently, and indexes are built once per mode:

```bash
python3 $SKILL_DIR/scripts/search.py ./repo \
  "class.*Error" \
  "error recovery strategy" \
  "def retry"
```

## Dependencies

- **tree-sitting**: Provides AST-based context expansion for `--expand`.
  Not required — search works without it, just with less structural context
  in results.
- **ripgrep**: Required for regex verification. Install via `uv tool install ripgrep`.
- **scikit-learn**: Required for semantic mode. Installs automatically.

## When to Use

- **Known target**: "where is the retry logic?", "find all error handlers"
- **Pattern matching**: regex across large codebases with indexed speedup
- **Concept search**: "authentication flow", "database connection pooling"
- **Cross-reference**: find all callers/users of a specific function

## When NOT to Use

- **First encounter**: "what does this repo do?" → use exploring-codebases
- **Repos under ~10 files**: just read them directly
- **Exact symbol lookup**: `find_symbol('ClassName')` via tree-sitting is simpler
- **Structural overview**: use tree-sitting's `tree_overview()` / `dir_overview()`

## Files

- `scripts/search.py` — Entry point, query routing, output formatting
- `scripts/resolve.py` — Input source resolution (GitHub, uploads, archives)
- `scripts/context.py` — tree-sitting-based AST context expansion
- `scripts/ngram_index.py` — Sparse n-gram inverted index, regex decomposition
- `scripts/sparse_ngrams.py` — Core n-gram algorithms, frequency weights
- `scripts/code_rag.py` — TF-IDF semantic search over code chunks
accessing-github-reposSkill

GitHub repository access in containerized environments using REST API and credential detection. Use when git clone fails, or when accessing private repos/writing files via API.

api-credentialsSkill

Securely manages API credentials for multiple providers (Anthropic Claude, Google Gemini, GitHub). Use when skills need to access stored API keys for external service invocations.

asking-questionsSkill

Guidance for asking clarifying questions when user requests are ambiguous, have multiple valid approaches, or require critical decisions. Use when implementation choices exist that could significantly affect outcomes.

assessing-impactSkill

>-

bm25Skill

>-

browsing-blueskySkill

Browse Bluesky content via API and firehose - search posts, fetch user activity, sample trending topics, read feeds and lists, analyze and categorize accounts. Supports authenticated access for personalized feeds. Use for Bluesky research, user monitoring, trend analysis, feed reading, firehose sampling, account categorization.

building-github-indexSkill

Generate progressive disclosure indexes for GitHub repositories to use as Claude project knowledge. Use when setting up projects referencing external documentation, creating searchable indexes of technical blogs or knowledge bases, combining multiple repos into one index, or when user mentions "index", "github repo", "project knowledge", or "documentation reference".

categorizing-bsky-accountsSkill

Analyze and categorize Bluesky accounts by topic using keyword extraction. Use when users mention Bluesky account analysis, following/follower lists, topic discovery, account curation, or network analysis.