Skill284 repo starsupdated 4d ago

etetoolkit

ETE Toolkit (ETE3) is a Python framework for phylogenetic tree analysis and visualization that parses Newick, NHX, PhyloXML, and NeXML formats, enables node traversal and annotation, renders publication-quality tree figures with customizable styles, and integrates NCBI taxonomy data. Use this skill to explore and modify tree topology, compute evolutionary statistics like Robinson-Foulds distances and monophyly tests, annotate nodes with bootstrap values or expression data, generate annotated tree visualizations, or prepare trees for downstream comparative genomics workflows.

View source Repository: SciAgent-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/etetoolkit && cp -r /tmp/etetoolkit/skills/genomics-bioinformatics/etetoolkit ~/.claude/skills/etetoolkit

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# ETE Toolkit: Phylogenetic Tree Analysis and Visualization

## Overview

ETE Toolkit (ETE3) is a Python framework for phylogenetic tree exploration, manipulation, and publication-quality visualization. It supports reading and writing Newick, NHX, PhyloXML, and NeXML formats, rich node annotation, programmatic tree traversal, NCBI taxonomy integration, and a flexible rendering engine for customizable tree figures. ETE3 is widely used in comparative genomics, phylogenomics, and evolutionary biology workflows.

## When to Use

- Parse phylogenetic trees from Newick, NHX, PhyloXML, or NeXML files and programmatically traverse or modify topology
- Annotate tree nodes with metadata (bootstrap values, gene names, taxonomic ranks, expression data) for visualization or downstream analysis
- Render publication-quality tree figures with custom node shapes, colors, branch widths, and face decorations using TreeStyle and NodeStyle
- Map NCBI taxonomy IDs to lineage information, validate species names, or build taxonomy-aware trees
- Compute evolutionary statistics: branch lengths, tree distances (Robinson-Foulds), LCA queries, monophyly tests
- Build PhyloTree objects for comparative genomics — gene duplication/speciation event annotation, orthologs/paralogs inference
- Prune, reroot, or ultrametricize trees programmatically before passing to downstream tools (BEAST, IQ-TREE, etc.)
- For sequence alignment prior to tree building, use `biopython-molecular-biology` instead

## Prerequisites

- **Python packages**: `ete3`, `numpy`, `PyQt5` (for interactive rendering), `lxml` (for PhyloXML)
- **Data requirements**: Newick string or tree file; NCBI taxonomy database (downloaded on first use for NCBI module)
- **Environment**: Python 3.6+; PyQt5 required for `TreeStyle` rendering and interactive GUI; headless rendering requires `xvfb`

> **Check before installing**: The tool may already be available in the current environment (e.g., inside a `pixi` / `conda` env). Run `command -v python` first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via `pixi run python` rather than bare `python`.

```bash
pip install ete3 numpy lxml PyQt5
# For headless rendering on Linux servers:
# apt-get install xvfb python3-pyqt5
```

## Quick Start

```python
from ete3 import Tree

# Load a Newick tree and inspect basic properties
t = Tree("((A:0.1,B:0.2)AB:0.3,(C:0.4,D:0.1)CD:0.2)root;")
print(f"Number of leaves: {len(t.get_leaves())}")
print(f"Leaf names: {t.get_leaf_names()}")
print(f"Tree depth: {t.get_farthest_leaf()[1]:.3f}")
# Number of leaves: 4
# Leaf names: ['A', 'B', 'C', 'D']
# Tree depth: 0.700
t.show()  # Opens interactive viewer (requires PyQt5)
```

## Core API

### Module 1: Tree I/O (Tree parsing and serialization)

Load trees from strings or files; write in various formats.

```python
from ete3 import Tree, PhyloTree

# Parse Newick string (format 1 = standard Newick with support values)
t = Tree("((A:0.1,B:0.2)90:0.3,(C:0.4,D:0.1)85:0.2)root;", format=1)
print(f"Root children: {[n.name for n in t.children]}")

# Load from file
t_file = Tree("my_tree.nwk")

# Write Newick with internal names and supports
nwk_str = t.write(format=1)
print(f"Newick: {nwk_str}")

# Write to file
t.write(outfile="output_tree.nwk", format=0)
print("Saved output_tree.nwk")
```

```python
from ete3 import PhyloTree

# Load PhyloXML tree (retains sequence annotations)
# pt = PhyloTree("my_phylo.xml", parser="phyloxml")

# Load NHX format (extended Newick with key=value annotations)
nhx = Tree("((A[&&NHX:S=human:D=Y],B[&&NHX:S=mouse:D=N]))")
for leaf in nhx.get_leaves():
    print(f"{leaf.name}: species={leaf.S}, duplication={leaf.D}")
# A: species=human, duplication=Y
# B: species=mouse, duplication=N
```

### Module 2: Tree Traversal and Search

Navigate nodes using pre-order, post-order, or breadth-first traversal; search by name or attribute.

```python
from ete3 import Tree

t = Tree("((Homo_sapiens:0.1,Pan_troglodytes:0.05)Hominidae:0.2,(Mus_musculus:0.3,Rattus_norvegicus:0.25)Muridae:0.4)Euarchontoglires;")

# Iterate all nodes (preorder by default)
for node in t.traverse("preorder"):
    depth = node.get_distance(t)
    print(f"{'leaf' if node.is_leaf() else 'internal'}: {node.name or 'unnamed'} depth={depth:.3f}")

# Search by name
human = t.search_nodes(name="Homo_sapiens")[0]
print(f"Human branch length: {human.dist:.3f}")
print(f"Ancestors: {[a.name for a in human.get_ancestors()]}")
```

```python
from ete3 import Tree

t = Tree("((Homo_sapiens:0.1,Pan_troglodytes:0.05)Hominidae:0.2,(Mus_musculus:0.3,Rattus_norvegicus:0.25)Muridae:0.4)Euarchontoglires;")

# Lowest common ancestor (LCA) query
human = t & "Homo_sapiens"   # shorthand for search_nodes(name=...)[0]
mouse = t & "Mus_musculus"
lca = t.get_common_ancestor(human, mouse)
print(f"LCA of human and mouse: {lca.name}")
# LCA of human and mouse: Euarchontoglires

# Check monophyly
is_mono, mono_type, broken = t.check_monophyly(
    values=["Homo_sapiens", "Pan_troglodytes"], target_attr="name"
)
print(f"Hominids monophyletic: {is_mono}, type: {mono_type}")
# Hominids monophyletic: True, type: monophyletic
```

### Module 3: Node Annotation

Add custom attributes to nodes for metadata-driven visualization and analysis.

```python
from ete3 import Tree

t = Tree("((Homo_sapiens,Pan_troglodytes)Hominidae,(Mus_musculus,Rattus_norvegicus)Muridae)Euarchontoglires;")

# Annotate leaves with arbitrary metadata
metadata = {
    "Homo_sapiens":      {"genome_size_gb": 3.2, "ploidy": 2, "color": "blue"},
    "Pan_troglodytes":   {"genome_size_gb": 3.1, "ploidy": 2, "color": "green"},
    "Mus_musculus":      {"genome_size_gb": 2.7, "ploidy": 2, "color": "orange"},
    "Rattus_norvegicus": {"genome_size_gb": 2.9, "ploidy": 2, "color": "red"},
}
for leaf in t.get_leaves():
    for attr, val in metadata[leaf.name].items():
        setattr(leaf, attr, val)

# Access annotations
for leaf in t.get_leaves():
    print(f"{