kegg-database
KEGG REST API (academic only). Pathways, genes, compounds, enzymes, diseases, drugs via 7 ops (info/list/find/get/conv/link/ddi). ID conversion (NCBI/UniProt/PubChem). Use bioservices for multi-DB Python.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/kegg-database && cp -r /tmp/kegg-database/skills/genomics-bioinformatics/databases/kegg-database ~/.claude/skills/kegg-databaseSKILL.md
# KEGG Database — Biological Pathway & Molecular Network Queries
## Overview
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource for biological pathway analysis, molecular interaction networks, and cross-database ID conversion. Access is via a direct REST API with no authentication — all operations use simple HTTP GET requests returning tab-delimited text.
## When to Use
- Mapping genes to biological pathways (e.g., "which pathways involve TP53?")
- Retrieving metabolic pathway details, gene lists, or compound structures
- Converting identifiers between KEGG, NCBI Gene, UniProt, and PubChem
- Checking drug-drug interactions from KEGG's pharmacological database
- Building pathway enrichment context (all genes per pathway for an organism)
- Cross-referencing compounds, reactions, enzymes, and pathways
- For **Python-native multi-database queries** (KEGG + UniProt + Ensembl in one script), prefer `bioservices` instead
- For **pathway visualization**, use KEGG Mapper (https://www.kegg.jp/kegg/mapper/) directly
## Prerequisites
```bash
pip install requests
```
**API constraints**:
- **Academic use only** — commercial use requires a separate KEGG license
- **Max 10 entries** per `get`/`list`/`conv`/`link`/`ddi` call (image/kgml/json: 1 entry only)
- **No explicit rate limit**, but add `time.sleep(0.5)` between batch requests to avoid server-side throttling
- Base URL: `https://rest.kegg.jp/`
## Quick Start
```python
import requests
import time
BASE = "https://rest.kegg.jp"
def kegg_get(operation, *args):
"""Generic KEGG REST API caller."""
url = f"{BASE}/{operation}/{'/'.join(args)}"
resp = requests.get(url)
resp.raise_for_status()
return resp.text
# Find pathways linked to human gene TP53
pathways = kegg_get("link", "pathway", "hsa:7157")
print(pathways[:200])
# hsa:7157 path:hsa04010
# hsa:7157 path:hsa04110
# ...
# Get pathway details
detail = kegg_get("get", "hsa04110")
print(detail[:300])
```
## Core API
### 1. Database Information — `kegg_info`
Retrieve metadata and statistics about KEGG databases.
```python
import requests
BASE = "https://rest.kegg.jp"
# Database-level info
info = requests.get(f"{BASE}/info/pathway").text
print(info[:200])
# pathway Pathway
# Release 112.0, Dec 2025
# Kanehisa Laboratories
# ...
# Organism-level info
hsa_info = requests.get(f"{BASE}/info/hsa").text
print(hsa_info[:200])
```
**Common databases**: `kegg`, `pathway`, `module`, `brite`, `genes`, `genome`, `compound`, `glycan`, `reaction`, `enzyme`, `disease`, `drug`
### 2. Listing Entries — `kegg_list`
List entry identifiers and names from any KEGG database.
```python
import requests
BASE = "https://rest.kegg.jp"
# All human pathways
hsa_pathways = requests.get(f"{BASE}/list/pathway/hsa").text
for line in hsa_pathways.strip().split("\n")[:5]:
pathway_id, name = line.split("\t")
print(f"{pathway_id}: {name}")
# path:hsa00010: Glycolysis / Gluconeogenesis - Homo sapiens (human)
# ...
# Specific entries (max 10, joined with +)
genes = requests.get(f"{BASE}/list/hsa:10458+hsa:10459").text
print(genes)
```
**Common organism codes**: `hsa` (human), `mmu` (mouse), `dme` (fruit fly), `sce` (yeast), `eco` (E. coli)
### 3. Keyword Search — `kegg_find`
Search databases by keywords or molecular properties.
```python
import requests
import time
BASE = "https://rest.kegg.jp"
# Keyword search in genes
results = requests.get(f"{BASE}/find/genes/p53").text
print(f"Found {len(results.strip().split(chr(10)))} entries")
time.sleep(0.5)
# Chemical formula search (exact match)
compounds = requests.get(f"{BASE}/find/compound/C7H10N4O2/formula").text
print(compounds[:200])
time.sleep(0.5)
# Molecular weight range search
drugs = requests.get(f"{BASE}/find/drug/300-310/exact_mass").text
print(drugs[:200])
```
**Search options**: append `/formula` (exact match), `/exact_mass` (range), `/mol_weight` (range) to compound/drug queries.
### 4. Entry Retrieval — `kegg_get`
Retrieve complete database entries or specific data formats.
```python
import requests
import time
BASE = "https://rest.kegg.jp"
# Full pathway entry (text format)
pathway = requests.get(f"{BASE}/get/hsa00010").text
print(pathway[:500])
time.sleep(0.5)
# Multiple entries (max 10, joined with +)
genes = requests.get(f"{BASE}/get/hsa:10458+hsa:10459").text
# Protein sequence (FASTA)
fasta = requests.get(f"{BASE}/get/hsa:10458/aaseq").text
print(fasta[:200])
time.sleep(0.5)
# Compound structure (MOL format)
mol = requests.get(f"{BASE}/get/cpd:C00002/mol").text # ATP
# Pathway image (PNG, single entry only)
img_resp = requests.get(f"{BASE}/get/hsa05130/image")
with open("pathway.png", "wb") as f:
f.write(img_resp.content)
print(f"Saved pathway image: {len(img_resp.content)} bytes")
```
**Output formats**: `aaseq` (protein FASTA), `ntseq` (nucleotide FASTA), `mol` (MOL), `kcf` (KCF), `image` (PNG), `kgml` (XML), `json` (pathway JSON). Image/KGML/JSON accept **one entry only**.
### 5. ID Conversion — `kegg_conv`
Convert identifiers between KEGG and external databases.
```python
import requests
import time
BASE = "https://rest.kegg.jp"
# KEGG gene → NCBI Gene ID (specific gene)
ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/hsa:10458").text
print(ncbi.strip())
# hsa:10458 ncbi-geneid:10458
time.sleep(0.5)
# KEGG gene → UniProt
uniprot = requests.get(f"{BASE}/conv/uniprot/hsa:10458").text
print(uniprot.strip())
time.sleep(0.5)
# Bulk conversion: all human genes → NCBI Gene IDs
all_conv = requests.get(f"{BASE}/conv/ncbi-geneid/hsa").text
lines = all_conv.strip().split("\n")
print(f"Total conversions: {len(lines)}")
# Reverse: NCBI Gene ID → KEGG
reverse = requests.get(f"{BASE}/conv/hsa/ncbi-geneid:7157").text
print(reverse.strip()) # TP53
```
**Supported external databases**: `ncbi-geneid`, `ncbi-proteinid`, `uniprot`, `pubchem`, `chebi`
### 6. Cross-Referencing — `kegg_link`
Find related entri|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-