algo-nlp-summarization
Implement text summarization using extractive and abstractive approaches. Use this skill when the user needs to condense long documents, build an automatic summarization pipeline, or compare summarization strategies — even if they say 'summarize this document', 'TLDR', or 'key points extraction'.
git clone --depth 1 https://github.com/asgard-ai-platform/skills /tmp/algo-nlp-summarization && cp -r /tmp/algo-nlp-summarization/algo-nlp-summarization ~/.claude/skills/algo-nlp-summarizationSKILL.md
# Text Summarization
## Overview
Text summarization condenses documents while preserving key information. Extractive: selects and concatenates important sentences from the original. Abstractive: generates new text that paraphrases the content. Extractive is simpler and more faithful; abstractive is more fluent but may hallucinate.
## When to Use
**Trigger conditions:**
- Condensing long documents, reports, or article collections
- Building automated summary pipelines for content curation
- Comparing extractive vs abstractive approaches for a use case
**When NOT to use:**
- When full document understanding is needed (summarization loses detail)
- For structured data extraction (use NER or information extraction)
## Algorithm
```
IRON LAW: Abstractive Summarization Can HALLUCINATE
Abstractive models may generate fluent text containing facts NOT in
the source. Always verify key claims in abstractive summaries against
the original document. For high-stakes use cases (legal, medical),
prefer extractive or use abstractive with factual consistency checking.
```
### Phase 1: Input Validation
Determine: input length, target summary length (ratio or word count), single-doc vs multi-doc, domain.
**Gate:** Input text available, target length defined.
### Phase 2: Core Algorithm
**Extractive (TextRank/LexRank):**
1. Split document into sentences
2. Build similarity graph (sentence nodes, cosine similarity edges)
3. Run PageRank on sentence graph
4. Select top-k sentences by rank, reorder by original position
**Abstractive (transformer-based):**
1. Use pre-trained model (BART, T5, Pegasus)
2. Encode input document (handle length limits with chunking if needed)
3. Generate summary with beam search
4. Post-process: check for repetition, factual consistency
### Phase 3: Verification
Evaluate: ROUGE scores (ROUGE-1, ROUGE-2, ROUGE-L) against reference summaries. Manual check for factual accuracy and coherence.
**Gate:** ROUGE scores reasonable for domain, no hallucinations in spot-check.
### Phase 4: Output
Return summary with metadata.
## Output Format
```json
{
"summary": "The company reported Q4 revenue of...",
"method": "extractive_textrank",
"metadata": {"input_words": 2000, "summary_words": 200, "compression_ratio": 0.10, "sentences_selected": 5}
}
```
## Examples
### Sample I/O
**Input:** 2000-word news article about quarterly earnings
**Expected:** 200-word summary covering: revenue, profit, guidance, key highlights. Extractive: 5-6 selected sentences. Abstractive: coherent paragraph.
### Edge Cases
| Input | Expected | Why |
|-------|----------|-----|
| Very short input (< 100 words) | Return as-is or minimal trimming | Already concise |
| Multiple contradicting sections | Summary may miss nuance | Summarization favors dominant theme |
| Technical jargon | Extractive preserves, abstractive may simplify | Domain expertise affects quality |
## Gotchas
- **ROUGE ≠ quality**: ROUGE measures n-gram overlap with references. A high-ROUGE summary can be incoherent, and a low-ROUGE summary can be excellent with different word choices.
- **Input length limits**: Transformer models have max token limits (512-4096). Long documents need chunking strategies (chunk-then-summarize or hierarchical summarization).
- **Repetition**: Abstractive models sometimes repeat phrases. Use repetition penalty during generation (no_repeat_ngram_size).
- **Position bias**: In news text, important information is front-loaded (inverted pyramid). Simple "take first N sentences" is a strong extractive baseline.
- **Multi-document summarization**: Summarizing multiple related documents requires handling redundancy and contradiction across sources.
## References
- For TextRank/LexRank implementation details, see `references/graph-based-extraction.md`
- For factual consistency checking, see `references/factual-consistency.md`Implement and select ad bidding strategies from manual CPC to automated target-CPA and target-ROAS. Use this skill when the user needs to choose a bidding strategy, set up automated bidding, or optimize bid parameters — even if they say 'what bidding strategy should I use', 'target CPA setup', or 'smart bidding configuration'.
Optimize advertising budget allocation across campaigns using marginal returns analysis. Use this skill when the user needs to distribute budget across multiple campaigns, optimize spend pacing, or maximize overall ROAS under budget constraints — even if they say 'how to split my ad budget', 'campaign budget optimization', or 'diminishing returns on ad spend'.
Build CTR prediction models for estimating ad click-through rates from features. Use this skill when the user needs to predict click probability, build an ad ranking model, or evaluate ad creative performance — even if they say 'predict click rate', 'ad relevance scoring', or 'which ad will get more clicks'.
Implement Generalized Second Price auction for ad slot allocation and pricing. Use this skill when the user needs to understand search ad auctions, compute ad positions and costs-per-click, or analyze bidding dynamics — even if they say 'how does Google Ads auction work', 'ad rank calculation', or 'second price auction for ads'.
Implement VCG mechanism for incentive-compatible ad slot allocation with truthful bidding. Use this skill when the user needs to design a truthful auction mechanism, compute externality-based payments, or understand why platforms may prefer GSP over VCG — even if they say 'truthful auction design', 'VCG payments', or 'incentive-compatible mechanism'.
Explain blockchain fundamentals including distributed ledger architecture, consensus mechanisms, and block structure. Use this skill when the user needs to understand blockchain concepts, evaluate whether blockchain fits a use case, or design a blockchain-based solution — even if they say 'how does blockchain work', 'do I need blockchain', or 'distributed ledger'.
Design and implement smart contracts as self-executing programmatic agreements on blockchain. Use this skill when the user needs to build automated on-chain logic, evaluate smart contract security, or design tokenized business rules — even if they say 'smart contract development', 'automated agreement', or 'on-chain logic'.
Implement BM25 ranking function for e-commerce product search relevance scoring. Use this skill when the user needs to build a text-based product search engine, improve search result relevance, or replace basic TF-IDF with a more robust ranking function — even if they say 'product search ranking', 'search relevance', or 'BM25 implementation'.