flowio-flow-cytometry
Parse/write FCS (Flow Cytometry) files v2.0-3.1. Events as NumPy, channel metadata, multi-dataset files, CSV/FCS export. Use FlowKit for gating/compensation.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/flowio-flow-cytometry && cp -r /tmp/flowio-flow-cytometry/skills/cell-biology/flowio-flow-cytometry ~/.claude/skills/flowio-flow-cytometrySKILL.md
# FlowIO — Flow Cytometry File Handler
## Overview
FlowIO is a lightweight Python library for reading and writing Flow Cytometry Standard (FCS) files. It parses FCS metadata, extracts event data as NumPy arrays, and creates new FCS files. Supports FCS versions 2.0, 3.0, and 3.1. Minimal dependencies — ideal for data pipelines and preprocessing before advanced analysis.
## When to Use
- Parsing FCS files to extract event data as NumPy arrays
- Reading channel metadata (names, ranges, types) from FCS files
- Converting flow cytometry data to pandas DataFrames or CSV
- Creating new FCS files from NumPy arrays or processed data
- Handling multi-dataset FCS files (separating combined datasets)
- Batch processing directories of FCS files
- Preprocessing flow cytometry data before downstream analysis
- For **compensation, gating, and FlowJo workspace support**, use FlowKit instead
- For **advanced cytometry visualization** (density plots, gating plots), use matplotlib or plotly
## Prerequisites
```bash
pip install flowio numpy pandas
```
Requires Python 3.9+. No compiled dependencies — installs on any platform.
## Quick Start
```python
from flowio import FlowData
flow = FlowData("experiment.fcs")
print(f"Events: {flow.event_count}, Channels: {flow.channel_count}")
print(f"Channels: {flow.pnn_labels}")
events = flow.as_array() # Shape: (n_events, n_channels)
print(f"Data shape: {events.shape}")
```
## Core API
### 1. Reading FCS Files
The `FlowData` class is the primary interface for reading FCS files.
```python
from flowio import FlowData
# Standard reading
flow = FlowData("sample.fcs")
print(f"Version: {flow.version}") # '3.0', '3.1', etc.
print(f"Events: {flow.event_count}")
print(f"Channels: {flow.channel_count}")
# Event data
events = flow.as_array() # Preprocessed (gain, log scaling)
raw = flow.as_array(preprocess=False) # Raw values
print(f"Shape: {events.shape}") # (n_events, n_channels)
# Memory-efficient: metadata only (skip DATA segment)
flow_meta = FlowData("sample.fcs", only_text=True)
print(f"Instrument: {flow_meta.text.get('$CYT', 'Unknown')}")
# Handle problematic files
flow = FlowData("bad.fcs", ignore_offset_discrepancy=True)
flow = FlowData("bad.fcs", use_header_offsets=True)
# Exclude null channels
flow = FlowData("sample.fcs", null_channel_list=["Time", "Null"])
```
### 2. Channel Metadata
Extract channel names, types, and ranges from FCS files.
```python
flow = FlowData("sample.fcs")
# Channel names
pnn = flow.pnn_labels # Short names: ['FSC-A', 'SSC-A', 'FL1-A', ...]
pns = flow.pns_labels # Descriptive: ['Forward Scatter', 'Side Scatter', 'FITC', ...]
pnr = flow.pnr_values # Range/max values per channel
# Channel type indices
scatter_idx = flow.scatter_indices # [0, 1] — FSC, SSC
fluoro_idx = flow.fluoro_indices # [2, 3, 4] — fluorescence channels
time_idx = flow.time_index # Time channel index (or None)
# Access by type
events = flow.as_array()
scatter_data = events[:, scatter_idx]
fluoro_data = events[:, fluoro_idx]
# Full metadata (TEXT segment dictionary)
text = flow.text
print(f"Date: {text.get('$DATE', 'N/A')}")
print(f"Instrument: {text.get('$CYT', 'N/A')}")
```
### 3. Creating FCS Files
Generate new FCS files from NumPy arrays.
```python
import numpy as np
from flowio import create_fcs
# Basic creation
events = np.random.rand(10000, 5) * 1000
channels = ["FSC-A", "SSC-A", "FL1-A", "FL2-A", "Time"]
create_fcs("output.fcs", events, channels)
# With descriptive names and metadata
create_fcs(
"output.fcs",
events,
channels,
opt_channel_names=["Forward Scatter", "Side Scatter", "FITC", "PE", "Time"],
metadata={"$SRC": "Python pipeline", "$DATE": "17-FEB-2026", "$CYT": "Synthetic"},
)
# Output: FCS 3.1, single-precision float
```
### 4. Multi-Dataset FCS Files
Handle FCS files containing multiple datasets.
```python
from flowio import FlowData, read_multiple_data_sets, MultipleDataSetsError
# Detect multi-dataset files
try:
flow = FlowData("sample.fcs")
except MultipleDataSetsError:
datasets = read_multiple_data_sets("sample.fcs")
print(f"Found {len(datasets)} datasets")
for i, ds in enumerate(datasets):
print(f"Dataset {i}: {ds.event_count} events, {ds.channel_count} channels")
events = ds.as_array()
# Read specific dataset by offset
first = FlowData("multi.fcs", nextdata_offset=0)
next_offset = int(first.text.get("$NEXTDATA", "0"))
if next_offset > 0:
second = FlowData("multi.fcs", nextdata_offset=next_offset)
```
### 5. Modifying and Re-Exporting
Read, modify, and save FCS data.
```python
from flowio import FlowData, create_fcs
# Read original
flow = FlowData("original.fcs")
events = flow.as_array(preprocess=False) # Use raw for modification
# Filter events (e.g., threshold on FSC)
mask = events[:, 0] > 500
filtered = events[mask]
print(f"Before: {len(events)}, After: {len(filtered)}")
# Save filtered data as new FCS
create_fcs(
"filtered.fcs",
filtered,
flow.pnn_labels,
opt_channel_names=flow.pns_labels,
metadata={**flow.text, "$SRC": "Filtered"},
)
# Or write with updated metadata (no event modification)
flow.write_fcs("updated.fcs", metadata={"$SRC": "Updated"})
```
## Key Concepts
### FCS File Structure
FCS files consist of four segments:
| Segment | Content | FlowData attribute |
|---------|---------|-------------------|
| HEADER | Version, byte offsets | `flow.header` |
| TEXT | Key-value metadata (`$DATE`, `$CYT`, channel names) | `flow.text` |
| DATA | Event data (binary/float) | `flow.events` (bytes), `flow.as_array()` |
| ANALYSIS | Optional processed results | `flow.analysis` |
### Preprocessing (as_array)
When `preprocess=True` (default), FlowIO applies:
1. **Gain scaling**: Multiply by PnG gain values
2. **Log transform**: Apply PnE exponential transform if present (`value = a × 10^(b × raw)`)
3. **Time scaling**: Convert time channel to proper units
Use `preproce|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-