Skip to main content
ClaudeWave
Skill209 repo starsupdated 7d ago

algo-nlp-ner

Implement Named Entity Recognition to identify and classify entities in text. Use this skill when the user needs to extract people, organizations, locations, dates, or custom entities from documents — even if they say 'extract names from text', 'find companies mentioned', or 'entity extraction'.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/asgard-ai-platform/skills /tmp/algo-nlp-ner && cp -r /tmp/algo-nlp-ner/algo-nlp-ner ~/.claude/skills/algo-nlp-ner
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Named Entity Recognition

## Overview

NER identifies and classifies named entities in text into predefined categories (Person, Organization, Location, Date, Money, etc.). Approaches: rule-based (regex, gazetteers), statistical (CRF), neural (BiLSTM-CRF, transformer-based). Modern NER uses spaCy or Hugging Face models with F1 scores 85-95%.

## When to Use

**Trigger conditions:**
- Extracting structured entities from unstructured text
- Building knowledge graphs from documents
- Preprocessing for information retrieval or question answering

**When NOT to use:**
- For text classification (categorizing whole documents, not extracting entities)
- For relation extraction between entities (need additional RE model)

## Algorithm

```
IRON LAW: NER Performance Depends on DOMAIN Match
A model trained on news text (OntoNotes) performs poorly on medical
records or legal documents. Domain-specific entities (drug names,
legal citations, product SKUs) require domain-specific training data
or fine-tuning. Always evaluate on YOUR domain's data.
```

### Phase 1: Input Validation
Determine: target entity types (standard: PER, ORG, LOC, DATE, MONEY or custom), input language, domain. Select appropriate pre-trained model or prepare training data.
**Gate:** Entity types defined, model or training data available.

### Phase 2: Core Algorithm
**Pre-trained model approach:**
1. Load model (spaCy, Hugging Face NER pipeline)
2. Process text through the pipeline
3. Extract entity spans with type labels and confidence scores

**Fine-tuning approach:**
1. Annotate 200+ domain-specific examples in BIO format
2. Fine-tune transformer model (BERT, RoBERTa) on annotated data
3. Evaluate on held-out test set

### Phase 3: Verification
Evaluate: precision, recall, F1 per entity type. Check: boundary detection (exact span match) and type classification accuracy.
**Gate:** F1 > 0.80 per entity type on domain-relevant test data.

### Phase 4: Output
Return extracted entities with types, positions, and confidence.

## Output Format

```json
{
  "entities": [{"text": "Apple Inc.", "type": "ORG", "start": 0, "end": 10, "confidence": 0.95}],
  "metadata": {"model": "en_core_web_trf", "entities_found": 15, "types": {"PER": 5, "ORG": 6, "LOC": 4}}
}
```

## Examples

### Sample I/O
**Input:** "Tim Cook announced that Apple will open a new store in Taipei on March 15."
**Expected:** [Tim Cook/PER, Apple/ORG, Taipei/LOC, March 15/DATE]

### Edge Cases
| Input | Expected | Why |
|-------|----------|-----|
| "Apple" (no context) | Ambiguous (fruit or company) | Context-dependent entity typing |
| Nested entities | Depends on scheme | "Bank of America" = ORG, "America" = LOC within |
| Misspelled entity | May miss | "Appel" not in training data |

## Gotchas

- **Boundary errors**: NER often gets the entity type right but the span wrong ("New" vs "New York City"). Evaluate with both exact and partial match metrics.
- **Ambiguity**: "Jordan" can be a person, country, or brand. Context-dependent disambiguation is hard; some models output the most likely type.
- **Chinese/Japanese NER**: No whitespace tokenization makes boundary detection harder. Use language-specific tokenizers (jieba for Chinese).
- **Annotation consistency**: Training data quality is critical. Inconsistent annotations (sometimes labeling "Dr." as part of name, sometimes not) degrade model performance.
- **Entity linking**: NER identifies mentions; entity linking resolves them to knowledge base entries. "Apple" → Apple Inc. (Q312) or apple (fruit). These are separate tasks.

## References

- For BIO annotation format and guidelines, see `references/bio-annotation.md`
- For fine-tuning NER with transformers, see `references/transformer-ner.md`
algo-ad-biddingSkill

Implement and select ad bidding strategies from manual CPC to automated target-CPA and target-ROAS. Use this skill when the user needs to choose a bidding strategy, set up automated bidding, or optimize bid parameters — even if they say 'what bidding strategy should I use', 'target CPA setup', or 'smart bidding configuration'.

algo-ad-budgetSkill

Optimize advertising budget allocation across campaigns using marginal returns analysis. Use this skill when the user needs to distribute budget across multiple campaigns, optimize spend pacing, or maximize overall ROAS under budget constraints — even if they say 'how to split my ad budget', 'campaign budget optimization', or 'diminishing returns on ad spend'.

algo-ad-ctrSkill

Build CTR prediction models for estimating ad click-through rates from features. Use this skill when the user needs to predict click probability, build an ad ranking model, or evaluate ad creative performance — even if they say 'predict click rate', 'ad relevance scoring', or 'which ad will get more clicks'.

algo-ad-gspSkill

Implement Generalized Second Price auction for ad slot allocation and pricing. Use this skill when the user needs to understand search ad auctions, compute ad positions and costs-per-click, or analyze bidding dynamics — even if they say 'how does Google Ads auction work', 'ad rank calculation', or 'second price auction for ads'.

algo-ad-vcgSkill

Implement VCG mechanism for incentive-compatible ad slot allocation with truthful bidding. Use this skill when the user needs to design a truthful auction mechanism, compute externality-based payments, or understand why platforms may prefer GSP over VCG — even if they say 'truthful auction design', 'VCG payments', or 'incentive-compatible mechanism'.

algo-blockchain-basicsSkill

Explain blockchain fundamentals including distributed ledger architecture, consensus mechanisms, and block structure. Use this skill when the user needs to understand blockchain concepts, evaluate whether blockchain fits a use case, or design a blockchain-based solution — even if they say 'how does blockchain work', 'do I need blockchain', or 'distributed ledger'.

algo-blockchain-smart-contractSkill

Design and implement smart contracts as self-executing programmatic agreements on blockchain. Use this skill when the user needs to build automated on-chain logic, evaluate smart contract security, or design tokenized business rules — even if they say 'smart contract development', 'automated agreement', or 'on-chain logic'.

algo-ecom-bm25Skill

Implement BM25 ranking function for e-commerce product search relevance scoring. Use this skill when the user needs to build a text-based product search engine, improve search result relevance, or replace basic TF-IDF with a more robust ranking function — even if they say 'product search ranking', 'search relevance', or 'BM25 implementation'.