bio-clinical-databases-somatic-signatures
This Claude Code skill extracts and decomposes somatic mutational signatures from cancer genomes using SigProfilerExtractor or MutationalPatterns. Use it when characterizing DNA damage mechanisms and mutagenic processes by identifying specific mutational patterns (SBS, DBS, indels) in tumor samples through de novo signature extraction or fitting to COSMIC reference databases.
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-clinical-databases-somatic-signatures && cp -r /tmp/bio-clinical-databases-somatic-signatures/skills/bio-clinical-databases-somatic-signatures ~/.claude/skills/bio-clinical-databases-somatic-signaturesSKILL.md
## Version Compatibility
Reference examples tested with: MutationalPatterns 3.12+, SigProfilerExtractor 1.1+, numpy 1.26+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Somatic Mutational Signatures
**"Extract mutational signatures from my tumor samples"** → Decompose somatic mutation catalogs into mutational signatures (SBS, DBS, ID) to identify DNA damage mechanisms and mutagenic processes in cancer genomes.
- Python: `SigProfilerExtractor.sigpro()` for de novo signature extraction
- R: `MutationalPatterns::fit_to_signatures()` for fitting to COSMIC signatures
## SigProfiler Workflow
**Goal:** Extract de novo mutational signatures and decompose to COSMIC reference signatures from somatic VCFs.
**Approach:** Generate a 96-trinucleotide-context mutation matrix with SigProfilerMatrixGenerator, extract signatures via NMF with SigProfilerExtractor, and fit to COSMIC with SigProfilerAssignment.
### Install and Generate Matrix
```python
from SigProfilerMatrixGenerator import install as genInstall
from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as matGen
# Install reference genome (one-time)
genInstall.install('GRCh38')
# Generate mutational matrix from VCF
# Input: Directory containing VCF files
# Output: SBS96 matrix (96 trinucleotide contexts)
matrices = matGen.SigProfilerMatrixGeneratorFunc(
project='my_project',
genome='GRCh38',
vcfFiles='/path/to/vcf_directory',
plot=True,
exome=False # Set True for WES
)
```
### Extract Signatures
```python
from SigProfilerExtractor import sigpro as sig
# De novo signature extraction
# Determines optimal number of signatures automatically
sig.sigProfilerExtractor(
input_type='matrix',
output='extraction_output',
input_data='my_project/output/SBS/my_project.SBS96.all',
reference_genome='GRCh38',
minimum_signatures=1,
maximum_signatures=10,
nmf_replicates=100,
cpu=-1 # Use all cores
)
```
### Decompose to COSMIC Signatures
```python
from SigProfilerAssignment import Analyzer as Analyze
# Fit to known COSMIC signatures
Analyze.cosmic_fit(
samples='my_project/output/SBS/my_project.SBS96.all',
output='assignment_output',
input_type='matrix',
genome_build='GRCh38',
signature_database='SBS_GRCh38_GRCh38'
)
```
## MutationalPatterns (R)
**Goal:** Analyze mutational spectra and fit to COSMIC signatures using the MutationalPatterns R package.
**Approach:** Load VCFs as GRanges, generate a 96-context mutation matrix against the reference genome, then fit to known COSMIC signatures or extract de novo via NMF.
### Load and Analyze
```r
library(MutationalPatterns)
library(BSgenome.Hsapiens.UCSC.hg38)
# Load VCF files
vcf_files <- list.files('vcf_dir', pattern = '\\.vcf$', full.names = TRUE)
sample_names <- gsub('.vcf', '', basename(vcf_files))
vcfs <- read_vcfs_as_granges(
vcf_files,
sample_names,
ref_genome = 'BSgenome.Hsapiens.UCSC.hg38'
)
# Generate 96-context mutation matrix
mut_mat <- mut_matrix(vcf_list = vcfs, ref_genome = 'BSgenome.Hsapiens.UCSC.hg38')
# Visualize spectrum
plot_96_profile(mut_mat)
```
### Fit to COSMIC Signatures
```r
# Load COSMIC signatures (v3.2)
cosmic_sigs <- get_known_signatures(muttype = 'snv')
# Fit samples to signatures
fit_result <- fit_to_signatures(mut_mat, cosmic_sigs)
# Plot contribution
plot_contribution(fit_result$contribution, cosmic_sigs, mode = 'absolute')
# Relative contribution
plot_contribution(fit_result$contribution, cosmic_sigs, mode = 'relative')
```
### De Novo Extraction
```r
# Extract de novo signatures using NMF
# Determine optimal rank
estimate <- estimate_rank(mut_mat, rank_range = 2:8, nrun = 50)
plot(estimate)
# Extract signatures
nmf_res <- extract_signatures(mut_mat, rank = 4, nrun = 100)
# Compare to COSMIC
cos_sim <- cos_sim_matrix(nmf_res$signatures, cosmic_sigs)
plot_cosine_heatmap(cos_sim)
```
## COSMIC Signature Etiology
**Goal:** Interpret extracted signatures by mapping them to known mutagenic processes (e.g., UV, smoking, MMR deficiency).
**Approach:** Look up each dominant signature in a COSMIC etiology reference table and filter by contribution threshold.
```python
# Common COSMIC signatures and their etiologies
SIGNATURE_ETIOLOGY = {
'SBS1': 'Spontaneous deamination (age-related)',
'SBS2': 'APOBEC activity',
'SBS3': 'Defective HR/BRCA1/2',
'SBS4': 'Tobacco smoking',
'SBS5': 'Unknown (age-related)',
'SBS6': 'MMR deficiency',
'SBS7a': 'UV exposure',
'SBS7b': 'UV exposure',
'SBS10a': 'POLE mutation',
'SBS10b': 'POLE mutation',
'SBS13': 'APOBEC activity',
'SBS15': 'MMR deficiency',
'SBS17a': 'Unknown',
'SBS17b': 'Unknown',
'SBS18': 'ROS damage',
'SBS22': 'Aristolochic acid',
'SBS26': 'MMR deficiency',
'SBS44': 'MMR deficiency',
}
def interpret_signatures(contributions):
'''Interpret signature contributions'''
interpretations = []
for sig, contrib in contributions.items():
if contrib > 0.05: # >5% contribution threshold
etiology = SIGNATURE_ETIOLOGY.get(sig, 'Unknown')
interpretations.append({
'signature': sig,
'contribution': contrib,
'etiology': etiology
})
return sorted(interpretations, key=lambda x: x['contribution'], reverse=True)
```
## Signature Categories
| Category | Signatures | Mechanism |
|----------|------------|-----------|
| Age-related | SBS1, SBS5 | Spontaneous deamination, clock-like |
| APOBEC | SBS2, SBS13 | Cytidine deaminase activity |
| MMR deficiency | SBS6, SBS15, SBS26, SBS44 | Mismatch repair defeCloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.
Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。
Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.