Skill284 repo starsupdated 4d ago

kegg-database

The KEGG Database skill enables programmatic access to the Kyoto Encyclopedia of Genes and Genomes via REST API for biological research. Use it to query metabolic pathways, convert molecular identifiers across databases (NCBI Gene, UniProt, PubChem), retrieve gene-pathway associations, check drug interactions, and build pathway enrichment datasets. Limited to academic use with a 10-entry maximum per request.

View source Repository: SciAgent-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/kegg-database && cp -r /tmp/kegg-database/skills/genomics-bioinformatics/databases/kegg-database ~/.claude/skills/kegg-database

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# KEGG Database — Biological Pathway & Molecular Network Queries

## Overview

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource for biological pathway analysis, molecular interaction networks, and cross-database ID conversion. Access is via a direct REST API with no authentication — all operations use simple HTTP GET requests returning tab-delimited text.

## When to Use

- Mapping genes to biological pathways (e.g., "which pathways involve TP53?")
- Retrieving metabolic pathway details, gene lists, or compound structures
- Converting identifiers between KEGG, NCBI Gene, UniProt, and PubChem
- Checking drug-drug interactions from KEGG's pharmacological database
- Building pathway enrichment context (all genes per pathway for an organism)
- Cross-referencing compounds, reactions, enzymes, and pathways
- For **Python-native multi-database queries** (KEGG + UniProt + Ensembl in one script), prefer `bioservices` instead
- For **pathway visualization**, use KEGG Mapper (https://www.kegg.jp/kegg/mapper/) directly

## Prerequisites

```bash
pip install requests
```

**API constraints**:
- **Academic use only** — commercial use requires a separate KEGG license
- **Max 10 entries** per `get`/`list`/`conv`/`link`/`ddi` call (image/kgml/json: 1 entry only)
- **No explicit rate limit**, but add `time.sleep(0.5)` between batch requests to avoid server-side throttling
- Base URL: `https://rest.kegg.jp/`

## Quick Start

```python
import requests
import time

BASE = "https://rest.kegg.jp"

def kegg_get(operation, *args):
    """Generic KEGG REST API caller."""
    url = f"{BASE}/{operation}/{'/'.join(args)}"
    resp = requests.get(url)
    resp.raise_for_status()
    return resp.text

# Find pathways linked to human gene TP53
pathways = kegg_get("link", "pathway", "hsa:7157")
print(pathways[:200])
# hsa:7157	path:hsa04010
# hsa:7157	path:hsa04110
# ...

# Get pathway details
detail = kegg_get("get", "hsa04110")
print(detail[:300])
```

## Core API

### 1. Database Information — `kegg_info`

Retrieve metadata and statistics about KEGG databases.

```python
import requests

BASE = "https://rest.kegg.jp"

# Database-level info
info = requests.get(f"{BASE}/info/pathway").text
print(info[:200])
# pathway          Pathway
#                  Release 112.0, Dec 2025
#                  Kanehisa Laboratories
#                  ...

# Organism-level info
hsa_info = requests.get(f"{BASE}/info/hsa").text
print(hsa_info[:200])
```

**Common databases**: `kegg`, `pathway`, `module`, `brite`, `genes`, `genome`, `compound`, `glycan`, `reaction`, `enzyme`, `disease`, `drug`

### 2. Listing Entries — `kegg_list`

List entry identifiers and names from any KEGG database.

```python
import requests

BASE = "https://rest.kegg.jp"

# All human pathways
hsa_pathways = requests.get(f"{BASE}/list/pathway/hsa").text
for line in hsa_pathways.strip().split("\n")[:5]:
    pathway_id, name = line.split("\t")
    print(f"{pathway_id}: {name}")
# path:hsa00010: Glycolysis / Gluconeogenesis - Homo sapiens (human)
# ...

# Specific entries (max 10, joined with +)
genes = requests.get(f"{BASE}/list/hsa:10458+hsa:10459").text
print(genes)
```

**Common organism codes**: `hsa` (human), `mmu` (mouse), `dme` (fruit fly), `sce` (yeast), `eco` (E. coli)

### 3. Keyword Search — `kegg_find`

Search databases by keywords or molecular properties.

```python
import requests
import time

BASE = "https://rest.kegg.jp"

# Keyword search in genes
results = requests.get(f"{BASE}/find/genes/p53").text
print(f"Found {len(results.strip().split(chr(10)))} entries")
time.sleep(0.5)

# Chemical formula search (exact match)
compounds = requests.get(f"{BASE}/find/compound/C7H10N4O2/formula").text
print(compounds[:200])
time.sleep(0.5)

# Molecular weight range search
drugs = requests.get(f"{BASE}/find/drug/300-310/exact_mass").text
print(drugs[:200])
```

**Search options**: append `/formula` (exact match), `/exact_mass` (range), `/mol_weight` (range) to compound/drug queries.

### 4. Entry Retrieval — `kegg_get`

Retrieve complete database entries or specific data formats.

```python
import requests
import time

BASE = "https://rest.kegg.jp"

# Full pathway entry (text format)
pathway = requests.get(f"{BASE}/get/hsa00010").text
print(pathway[:500])
time.sleep(0.5)

# Multiple entries (max 10, joined with +)
genes = requests.get(f"{BASE}/get/hsa:10458+hsa:10459").text

# Protein sequence (FASTA)
fasta = requests.get(f"{BASE}/get/hsa:10458/aaseq").text
print(fasta[:200])
time.sleep(0.5)

# Compound structure (MOL format)
mol = requests.get(f"{BASE}/get/cpd:C00002/mol").text  # ATP

# Pathway image (PNG, single entry only)
img_resp = requests.get(f"{BASE}/get/hsa05130/image")
with open("pathway.png", "wb") as f:
    f.write(img_resp.content)
print(f"Saved pathway image: {len(img_resp.content)} bytes")
```

**Output formats**: `aaseq` (protein FASTA), `ntseq` (nucleotide FASTA), `mol` (MOL), `kcf` (KCF), `image` (PNG), `kgml` (XML), `json` (pathway JSON). Image/KGML/JSON accept **one entry only**.

### 5. ID Conversion — `kegg_conv`

Convert identifiers between KEGG and external databases.

```python
import requests
import time

BASE = "https://rest.kegg.jp"

# KEGG gene → NCBI Gene ID (specific gene)
ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/hsa:10458").text
print(ncbi.strip())
# hsa:10458	ncbi-geneid:10458
time.sleep(0.5)

# KEGG gene → UniProt
uniprot = requests.get(f"{BASE}/conv/uniprot/hsa:10458").text
print(uniprot.strip())
time.sleep(0.5)

# Bulk conversion: all human genes → NCBI Gene IDs
all_conv = requests.get(f"{BASE}/conv/ncbi-geneid/hsa").text
lines = all_conv.strip().split("\n")
print(f"Total conversions: {len(lines)}")

# Reverse: NCBI Gene ID → KEGG
reverse = requests.get(f"{BASE}/conv/hsa/ncbi-geneid:7157").text
print(reverse.strip())  # TP53
```

**Supported external databases**: `ncbi-geneid`, `ncbi-proteinid`, `uniprot`, `pubchem`, `chebi`

### 6. Cross-Referencing — `kegg_link`

Find related entri