git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/bioservices-multi-database && cp -r /tmp/bioservices-multi-database/skills/genomics-bioinformatics/databases/bioservices-multi-database ~/.claude/skills/bioservices-multi-databaseSKILL.md
# BioServices Multi-Database Access
## Overview
BioServices provides a unified Python interface to 40+ bioinformatics web services including UniProt, KEGG, ChEMBL, ChEBI, PubChem, UniChem, PSICQUIC, QuickGO, and BLAST. Each service is accessed through a consistent object-oriented API with built-in caching, rate limiting, and output format handling.
## When to Use
- Querying protein information from UniProt (search, retrieve, ID mapping)
- Discovering KEGG pathways and extracting gene/interaction networks
- Cross-referencing compounds across ChEMBL, ChEBI, PubChem, and KEGG
- Running BLAST sequence similarity searches against UniProtKB
- Mapping identifiers between biological databases (UniProt, Ensembl, KEGG, RefSeq, PDB)
- Retrieving Gene Ontology annotations via QuickGO
- Finding protein-protein interactions via PSICQUIC (IntAct, MINT, BioGRID)
- Batch converting thousands of biological identifiers with error handling
- For single-database deep queries → use gget (Ensembl), pubchempy (PubChem), or chembl-database-bioactivity skill
- For pathway visualization → use pathway analysis tools (Cytoscape, NetworkX) after retrieving data with bioservices
## Prerequisites
```bash
pip install bioservices
# Optional: pandas for tabular output, matplotlib for visualization
pip install pandas matplotlib
```
**API Rate Limits**: Most services have rate limits. bioservices handles basic throttling internally, but for batch operations add explicit delays:
- UniProt mapping: ~1 request/second for batch jobs
- KEGG: 10 requests/second (be conservative with pathway parsing)
- ChEMBL/ChEBI: 5-10 requests/second
- BLAST: 1 job at a time (async polling, ~30-300s per job)
## Quick Start
```python
from bioservices import UniProt, KEGG
import time
# Protein lookup
u = UniProt(verbose=False)
result = u.search("ABL1_HUMAN", frmt="tsv", columns="accession,gene_names,organism_name,length")
print(result[:200])
# Pathway discovery
k = KEGG(verbose=False)
pathways = k.get_pathway_by_gene("hsa:25", "hsa") # ABL1
print(f"ABL1 participates in {len(pathways)} pathways")
for pid, name in list(pathways.items())[:3]:
print(f" {pid}: {name}")
```
## Core API
### 1. Protein Analysis (UniProt)
```python
from bioservices import UniProt
u = UniProt(verbose=False)
# Search by protein name or gene
result = u.search("BRCA1 AND organism_id:9606", frmt="tsv",
columns="accession,gene_names,protein_name,length,go_p")
print(result[:300])
# Retrieve full entry
entry = u.retrieve("P38398", frmt="txt") # Swiss-Prot flat file
fasta = u.retrieve("P38398", frmt="fasta")
print(fasta[:200])
```
```python
# ID mapping: gene names → UniProt accessions
result = u.mapping(fr="Gene_Name", to="UniProtKB", query="BRCA1 TP53 ABL1", taxId=9606)
print(f"Mapped {len(result['results'])} entries")
for r in result['results']:
print(f" {r['from']} → {r['to']['primaryAccession']}")
```
### 2. Pathway Discovery (KEGG)
```python
from bioservices import KEGG
k = KEGG(verbose=False)
# List pathways for an organism
pathways = k.pathwayIds # All reference pathways
human_pathways = k.list("pathway", "hsa")
print(f"Human pathways: {len(human_pathways.strip().splitlines())}")
# Get pathway details
pathway_data = k.get("hsa04110") # Cell cycle
parsed = k.parse(pathway_data)
print(f"Pathway: {parsed.get('NAME', 'Unknown')}")
print(f"Genes: {len(parsed.get('GENE', {}))}")
```
```python
# KGML parsing for interaction networks
from bioservices import KEGG
k = KEGG(verbose=False)
kgml = k.get("hsa04110", "kgml") # XML pathway representation
# Parse KGML for entries and relations
import xml.etree.ElementTree as ET
root = ET.fromstring(kgml)
entries = root.findall("entry")
relations = root.findall("relation")
print(f"Entries: {len(entries)}, Relations: {len(relations)}")
# Extract interaction types
from collections import Counter
rel_types = Counter()
for rel in relations:
for subtype in rel.findall("subtype"):
rel_types[subtype.get("name")] += 1
print(f"Interaction types: {dict(rel_types)}")
```
### 3. Compound Databases (ChEMBL, ChEBI, UniChem, PubChem)
```python
from bioservices import ChEMBL, ChEBI, UniChem
import time
# ChEMBL compound lookup
chembl = ChEMBL(verbose=False)
result = chembl.get_molecule("CHEMBL25") # Aspirin
print(f"Name: {result['pref_name']}")
print(f"MW: {result['molecule_properties']['full_mwt']}")
print(f"SMILES: {result['molecule_structures']['canonical_smiles']}")
time.sleep(0.2)
# ChEBI entity lookup
chebi = ChEBI(verbose=False)
entity = chebi.getCompleteEntity("CHEBI:15365") # Aspirin
print(f"ChEBI Name: {entity.chebiAsciiName}")
print(f"Formula: {entity.formulae[0].data if entity.formulae else 'N/A'}")
```
```python
# Cross-database compound mapping via UniChem
from bioservices import UniChem
uc = UniChem()
# Map ChEMBL ID to other databases
# Source IDs: 1=ChEMBL, 2=DrugBank, 3=PDB, 4=IUPHAR, 7=ChEBI, 22=PubChem
mappings = uc.get_mapping("CHEMBL25", 1) # From ChEMBL
for m in mappings[:5]:
print(f" Source {m['src_id']}: {m['src_compound_id']}")
```
### 4. Sequence Analysis (BLAST)
```python
from bioservices import NCBIblast
import time
blast = NCBIblast(verbose=False)
sequence = ">query\nMKTAYIAKQRQISFVKSHFSRQLE..." # Truncated for brevity
job_id = blast.run(
program="blastp",
database="uniprotkb_swissprot",
sequence=sequence,
stype="protein",
email="user@example.com" # Required by NCBI
)
print(f"Job submitted: {job_id}")
# Poll for results (async)
while blast.getStatus(job_id) == "RUNNING":
time.sleep(10)
print("Waiting...")
result_types = blast.getResultTypes(job_id)
alignment = blast.getResult(job_id, "out") # Text alignment
print(alignment[:500])
```
### 5. Identifier Mapping
```python
from bioservices import UniProt
u = UniProt(verbose=False)
# Batch mapping: UniProt → multiple databases
accessions = "P00520 P12931 P04637 P38398"
# UniProt → PDB
result = u.mapping(fr="UniProtKB_AC-ID", to="PDB", query=accessions)
for r|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-