Skip to main content
ClaudeWave
Skill843 estrellas del repoactualizado 4d ago

academic-literature-search

Academic Literature Search handles PubMed, bioRxiv, arXiv, and academic reference management tasks. Use it when searching for academic papers, retrieving literature, generating citations, formatting references in GB/T 7714-2015 format, or managing citation deduplication and renumbering.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/beita6969/ScienceClaw /tmp/academic-literature-search && cp -r /tmp/academic-literature-search/skills/academic-literature-search ~/.claude/skills/academic-literature-search
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Academic Literature Search — 学术文献检索与引用管理

Use this skill when the user asks to search for academic papers, retrieve literature, generate citations, format references, or any task involving PubMed, bioRxiv, arXiv, or academic reference management. Trigger keywords: "搜文献", "检索", "找论文", "参考文献", "引用", "citation", "search papers", "PubMed", "bioRxiv", "arXiv", "GB/T 7714", "PMID", "DOI", "批量引用".

## Core Principles

1. **MCP first, Python second**: PubMed operations → MCP tools (zero code). Python only for arXiv, GB/T 7714 formatting, and citation post-processing.
2. **Code-driven citations**: Citation formatting, validation, deduplication, renumbering — ALL via Python code. NEVER fabricate PMIDs, DOIs, author names, or journal names.
3. **GB/T 7714-2015 sequential numbering**: `[1][2][3]` in-text, references numbered by order of first appearance.
4. **Journal name consistency**: Use **full journal names** throughout (NOT ISO abbreviations). E.g., `Nature Medicine` not `Nat Med`. If MCP returns abbreviated names, expand them; if expansion is uncertain, use the name as returned.

---

## Tool Routing Decision Table

| 操作 | 用什么 | 为什么 |
|------|--------|--------|
| PubMed 关键词搜索 | **MCP** `pubmed_search_articles` | 原生日期/类型过滤/排序,Agent 零代码 |
| PMID 批量获取详情 | **MCP** `pubmed_fetch_contents` | 4种详情级别,一次200个,含 MeSH |
| 相似论文发现 | **MCP** `pubmed_article_connections` (similar) | 直接调用,返回结构化数据 |
| 被引论文发现 | **MCP** `pubmed_article_connections` (citedin) | 同上 |
| 论文参考文献(它引了谁) | **MCP** `pubmed_article_connections` (references) | **MCP 独有**,ELink 不支持 |
| RIS/BibTeX 导出 | **MCP** `pubmed_article_connections` (citation_formats) | 内置格式化 |
| bioRxiv/medRxiv 搜索 | **MCP** `pubmed_search_articles` + journal filter | queryTerm 加 `biorxiv[journal]` |
| arXiv 搜索 | **Python** | MCP 不覆盖 arXiv |
| GB/T 7714-2015 格式化 | **Python** | MCP 无国标格式 |
| 引用后处理/去重/编号 | **Python** | MCP 不覆盖 |

---

## MCP Operations (PubMed — 主力)

### 1. 关键词搜索

```
Tool: pubmed_search_articles
Parameters:
  queryTerm: "large language model bioinformatics"
  maxResults: 20
  sortBy: "relevance"                       ← 或 "pub_date"
  fetchBriefSummaries: 10                   ← 返回前10篇摘要
  dateRange:                                ← 可选
    minDate: "2022"
    maxDate: "2026"
    dateType: "pdat"
  filterByPublicationTypes: ["Review"]      ← 可选
```

**bioRxiv/medRxiv**:queryTerm 加 journal filter:
```
queryTerm: "(large language model agent) AND (biorxiv[journal] OR medrxiv[journal])"
```

### 2. PMID 批量获取详情

```
Tool: pubmed_fetch_contents
Parameters:
  pmids: ["39361263", "38768397", "36869294"]
  detailLevel: "abstract_plus"              ← 推荐,解析后的结构化数据
  includeMeshTerms: true
```

详情级别:`abstract_plus`(推荐)| `citation_data`(轻量)| `full_xml` | `medline_text`

### 3. 相关论文发现

```
Tool: pubmed_article_connections
Parameters:
  sourcePmid: "39361263"
  relationshipType: "pubmed_similar_articles"   ← 或 citedin / references / citation_formats
  maxRelatedResults: 15
```

### 4. 引用格式导出

```
Tool: pubmed_article_connections
Parameters:
  sourcePmid: "39361263"
  relationshipType: "citation_formats"
  citationStyles: ["ris", "bibtex", "apa_string"]
```

---

## Python Operations

### arXiv 搜索

Uses stdlib only (`urllib.request` + `xml.etree.ElementTree`), no external dependencies:

```python
import urllib.request, urllib.parse, xml.etree.ElementTree as ET, re

def _parse_author_name(full_name):
    """Convert 'Shunyu Yao' → 'Yao S' (GB/T 7714: surname first, initials)."""
    parts = full_name.strip().split()
    if not parts:
        return full_name
    if len(parts) == 1:
        return parts[0]
    surname = parts[-1]
    initials = "".join(p[0].upper() for p in parts[:-1])
    return f"{surname} {initials}"


def search_arxiv(query, max_results=10):
    """Search arXiv. Free API, no key needed. Uses only stdlib."""
    params = urllib.parse.urlencode({
        "search_query": f"all:{query}", "start": 0,
        "max_results": max_results, "sortBy": "relevance",
        "sortOrder": "descending"
    })
    url = f"https://export.arxiv.org/api/query?{params}"
    with urllib.request.urlopen(url, timeout=30) as resp:
        xml_text = resp.read().decode("utf-8")

    ns = {"atom": "http://www.w3.org/2005/Atom",
          "arxiv": "http://arxiv.org/schemas/atom"}
    root = ET.fromstring(xml_text)
    articles = []
    for entry in root.findall("atom:entry", ns):
        title = (entry.findtext("atom:title", "", ns) or "").replace("\n", " ").strip()
        abstract = (entry.findtext("atom:summary", "", ns) or "").replace("\n", " ").strip()

        raw_authors = [a.findtext("atom:name", "", ns).strip()
                       for a in entry.findall("atom:author", ns)]
        authors_gbt = [_parse_author_name(a) for a in raw_authors]

        published = entry.findtext("atom:published", "", ns)[:10]
        id_url = entry.findtext("atom:id", "", ns) or ""
        arxiv_id = id_url.split("/abs/")[-1] if "/abs/" in id_url else ""

        doi_e = entry.find("arxiv:doi", ns)
        doi = doi_e.text.strip() if doi_e is not None and doi_e.text else ""

        cat_e = entry.find("arxiv:primary_category", ns)
        cat = cat_e.get("term", "") if cat_e is not None else ""

        articles.append({
            "source": "arxiv", "pmid": "", "doi": doi, "arxiv_id": arxiv_id,
            "title": title, "authors": authors_gbt,
            "journal": f"arXiv:{arxiv_id}",
            "year": published[:4] if published else "",
            "volume": "", "issue": "", "pages": "",
            "abstract": abstract,
            "url": f"https://arxiv.org/abs/{arxiv_id}",
            "category": cat,
            "venue": "",  # populated manually for published conference papers
        })
    return articles
```

### GB/T 7714-2015 格式化 (完整版)

支持 [J] 期刊、[Z/OL] 预印本、[C] 会议论文、[M] 专著、[D] 学位论文。
所有字段均做 None 安全处理。

```python
def format_gbt7714(article, seq_num):
    """Format one article as GB/T 7714-2015 sequential reference.

    Handles: MCP article dicts (authors as list of dicts