etetoolkit
ETE Toolkit (ETE3): Python phylogenetic tree analysis and visualization. Parse Newick/NHX/PhyloXML, traverse/annotate nodes, render figures with TreeStyle/NodeStyle, integrate NCBI taxonomy, run PhyloTree comparative genomics. Use for species trees, gene family evolution, annotated tree figures.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/etetoolkit && cp -r /tmp/etetoolkit/skills/genomics-bioinformatics/etetoolkit ~/.claude/skills/etetoolkitSKILL.md
# ETE Toolkit: Phylogenetic Tree Analysis and Visualization
## Overview
ETE Toolkit (ETE3) is a Python framework for phylogenetic tree exploration, manipulation, and publication-quality visualization. It supports reading and writing Newick, NHX, PhyloXML, and NeXML formats, rich node annotation, programmatic tree traversal, NCBI taxonomy integration, and a flexible rendering engine for customizable tree figures. ETE3 is widely used in comparative genomics, phylogenomics, and evolutionary biology workflows.
## When to Use
- Parse phylogenetic trees from Newick, NHX, PhyloXML, or NeXML files and programmatically traverse or modify topology
- Annotate tree nodes with metadata (bootstrap values, gene names, taxonomic ranks, expression data) for visualization or downstream analysis
- Render publication-quality tree figures with custom node shapes, colors, branch widths, and face decorations using TreeStyle and NodeStyle
- Map NCBI taxonomy IDs to lineage information, validate species names, or build taxonomy-aware trees
- Compute evolutionary statistics: branch lengths, tree distances (Robinson-Foulds), LCA queries, monophyly tests
- Build PhyloTree objects for comparative genomics — gene duplication/speciation event annotation, orthologs/paralogs inference
- Prune, reroot, or ultrametricize trees programmatically before passing to downstream tools (BEAST, IQ-TREE, etc.)
- For sequence alignment prior to tree building, use `biopython-molecular-biology` instead
## Prerequisites
- **Python packages**: `ete3`, `numpy`, `PyQt5` (for interactive rendering), `lxml` (for PhyloXML)
- **Data requirements**: Newick string or tree file; NCBI taxonomy database (downloaded on first use for NCBI module)
- **Environment**: Python 3.6+; PyQt5 required for `TreeStyle` rendering and interactive GUI; headless rendering requires `xvfb`
> **Check before installing**: The tool may already be available in the current environment (e.g., inside a `pixi` / `conda` env). Run `command -v python` first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via `pixi run python` rather than bare `python`.
```bash
pip install ete3 numpy lxml PyQt5
# For headless rendering on Linux servers:
# apt-get install xvfb python3-pyqt5
```
## Quick Start
```python
from ete3 import Tree
# Load a Newick tree and inspect basic properties
t = Tree("((A:0.1,B:0.2)AB:0.3,(C:0.4,D:0.1)CD:0.2)root;")
print(f"Number of leaves: {len(t.get_leaves())}")
print(f"Leaf names: {t.get_leaf_names()}")
print(f"Tree depth: {t.get_farthest_leaf()[1]:.3f}")
# Number of leaves: 4
# Leaf names: ['A', 'B', 'C', 'D']
# Tree depth: 0.700
t.show() # Opens interactive viewer (requires PyQt5)
```
## Core API
### Module 1: Tree I/O (Tree parsing and serialization)
Load trees from strings or files; write in various formats.
```python
from ete3 import Tree, PhyloTree
# Parse Newick string (format 1 = standard Newick with support values)
t = Tree("((A:0.1,B:0.2)90:0.3,(C:0.4,D:0.1)85:0.2)root;", format=1)
print(f"Root children: {[n.name for n in t.children]}")
# Load from file
t_file = Tree("my_tree.nwk")
# Write Newick with internal names and supports
nwk_str = t.write(format=1)
print(f"Newick: {nwk_str}")
# Write to file
t.write(outfile="output_tree.nwk", format=0)
print("Saved output_tree.nwk")
```
```python
from ete3 import PhyloTree
# Load PhyloXML tree (retains sequence annotations)
# pt = PhyloTree("my_phylo.xml", parser="phyloxml")
# Load NHX format (extended Newick with key=value annotations)
nhx = Tree("((A[&&NHX:S=human:D=Y],B[&&NHX:S=mouse:D=N]))")
for leaf in nhx.get_leaves():
print(f"{leaf.name}: species={leaf.S}, duplication={leaf.D}")
# A: species=human, duplication=Y
# B: species=mouse, duplication=N
```
### Module 2: Tree Traversal and Search
Navigate nodes using pre-order, post-order, or breadth-first traversal; search by name or attribute.
```python
from ete3 import Tree
t = Tree("((Homo_sapiens:0.1,Pan_troglodytes:0.05)Hominidae:0.2,(Mus_musculus:0.3,Rattus_norvegicus:0.25)Muridae:0.4)Euarchontoglires;")
# Iterate all nodes (preorder by default)
for node in t.traverse("preorder"):
depth = node.get_distance(t)
print(f"{'leaf' if node.is_leaf() else 'internal'}: {node.name or 'unnamed'} depth={depth:.3f}")
# Search by name
human = t.search_nodes(name="Homo_sapiens")[0]
print(f"Human branch length: {human.dist:.3f}")
print(f"Ancestors: {[a.name for a in human.get_ancestors()]}")
```
```python
from ete3 import Tree
t = Tree("((Homo_sapiens:0.1,Pan_troglodytes:0.05)Hominidae:0.2,(Mus_musculus:0.3,Rattus_norvegicus:0.25)Muridae:0.4)Euarchontoglires;")
# Lowest common ancestor (LCA) query
human = t & "Homo_sapiens" # shorthand for search_nodes(name=...)[0]
mouse = t & "Mus_musculus"
lca = t.get_common_ancestor(human, mouse)
print(f"LCA of human and mouse: {lca.name}")
# LCA of human and mouse: Euarchontoglires
# Check monophyly
is_mono, mono_type, broken = t.check_monophyly(
values=["Homo_sapiens", "Pan_troglodytes"], target_attr="name"
)
print(f"Hominids monophyletic: {is_mono}, type: {mono_type}")
# Hominids monophyletic: True, type: monophyletic
```
### Module 3: Node Annotation
Add custom attributes to nodes for metadata-driven visualization and analysis.
```python
from ete3 import Tree
t = Tree("((Homo_sapiens,Pan_troglodytes)Hominidae,(Mus_musculus,Rattus_norvegicus)Muridae)Euarchontoglires;")
# Annotate leaves with arbitrary metadata
metadata = {
"Homo_sapiens": {"genome_size_gb": 3.2, "ploidy": 2, "color": "blue"},
"Pan_troglodytes": {"genome_size_gb": 3.1, "ploidy": 2, "color": "green"},
"Mus_musculus": {"genome_size_gb": 2.7, "ploidy": 2, "color": "orange"},
"Rattus_norvegicus": {"genome_size_gb": 2.9, "ploidy": 2, "color": "red"},
}
for leaf in t.get_leaves():
for attr, val in metadata[leaf.name].items():
setattr(leaf, attr, val)
# Access annotations
for leaf in t.get_leaves():
print(f"{|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-