Skip to main content
ClaudeWave
Skill199 repo starsupdated 16d ago

statistical-significance-annotation

Guide for annotating statistical significance (p-value asterisks) on comparison plots. Covers standard notation (ns, *, **, ***, ****), matplotlib bracket+asterisk implementation, and use with seaborn box/violin/bar plots. Use when preparing publication-ready figures with significance markers.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/statistical-significance-annotation && cp -r /tmp/statistical-significance-annotation/skills/data-visualization/statistical-significance-annotation ~/.claude/skills/statistical-significance-annotation
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Statistical Significance Annotation on Plots

## Overview

Statistical significance annotations (asterisk notation) are visual markers placed on comparison plots to indicate the results of hypothesis tests between groups. They consist of brackets connecting two groups and asterisk symbols denoting the p-value range. Proper annotation ensures that the visual claims in a figure match the quantitative evidence, making plots publication-ready and scientifically rigorous. This guide covers the standard conventions, when and how to annotate, and a reusable matplotlib implementation.

## Key Concepts

### Standard Asterisk Notation

The widely adopted convention maps p-value ranges to asterisk symbols:

| Symbol | P-value Range | Meaning |
|--------|--------------|---------|
| ns | p > 0.05 | Not significant |
| \* | p <= 0.05 | Significant |
| \*\* | p <= 0.01 | Highly significant |
| \*\*\* | p <= 0.001 | Very highly significant |
| \*\*\*\* | p <= 0.0001 | Extremely significant |

The conversion function:

```python
def pvalue_to_asterisk(p: float) -> str:
    """Convert a p-value to standard asterisk notation."""
    if p <= 0.0001:
        return "****"
    elif p <= 0.001:
        return "***"
    elif p <= 0.01:
        return "**"
    elif p <= 0.05:
        return "*"
    else:
        return "ns"
```

### Adjusted vs Raw P-values

- **Single comparison** (one t-test): Use raw p-value.
- **Multiple comparisons** (pairwise tests across 3+ groups, multiple genes): Use adjusted p-values (FDR/Benjamini-Hochberg or Bonferroni). Annotating with raw p-values inflates significance.
- **Pre-computed results** (DESeq2 `padj`, ANOVA post-hoc): Use the adjusted values already provided.

### Comparison Selection

Not every pair of groups needs annotation. Select comparisons that:

- Directly support the claim made in the analysis text
- Are biologically meaningful (e.g., treatment vs control, not control-A vs control-B)
- Are limited in number to keep the figure readable (typically 1-5 per panel)

## Decision Framework

```
Does the plot compare groups?
├── No (scatter, heatmap, PCA, line trend) → Do NOT annotate
└── Yes (box, violin, bar, strip)
    ├── Does the analysis claim significance? → Annotate the claimed comparisons
    ├── Exploratory (no specific claim) → Annotate vs control only, or skip
    └── Too many groups (>6 pairwise) → Annotate key comparisons only
```

| Scenario | Annotate? | Which pairs |
|----------|-----------|-------------|
| DEG box plot: treatment vs control | Yes | Treatment vs Control |
| Multi-group ANOVA with post-hoc | Yes | Significant post-hoc pairs only |
| Gene expression across 10 cell types | Selectively | vs reference cell type only |
| PCA or UMAP | No | N/A |
| Heatmap or volcano plot | No | N/A |
| Correlation scatter | No | Report r and p in text/legend |
| Exploratory bar plot, no hypothesis | Optional | vs control if applicable |

## Best Practices

1. **Match annotations to text claims**: Every asterisk on the plot must correspond to a statistical test described in the analysis. Never annotate without having computed the test.
2. **Use adjusted p-values for multiple comparisons**: When testing more than one pair, always use FDR-corrected or Bonferroni-corrected p-values. State the correction method in the figure legend.
3. **Limit annotated pairs**: Annotate only comparisons relevant to the analysis conclusion. Over-annotating clutters the figure and dilutes focus.
4. **Position brackets clearly**: Place brackets above the data range with enough vertical offset to avoid overlapping with data points, error bars, or other brackets. Stack multiple brackets with consistent spacing.
5. **State the statistical test**: Always note the test used (t-test, Mann-Whitney U, Wilcoxon, ANOVA + Tukey HSD, etc.) in the figure title, caption, or legend.
6. **Include sample sizes**: Show n per group in the axis labels (e.g., "Control (n=30)") or figure legend.
7. **Use bold titles**: Set `fontweight='bold'` on figure titles for publication readiness.

## Common Pitfalls

1. **Annotating all pairwise comparisons in a multi-group plot**
   - *How to avoid*: Select only hypothesis-driven pairs. For k groups, k*(k-1)/2 pairs quickly becomes unreadable. Show vs control or specific contrasts only.

2. **Using raw p-values when multiple comparisons were performed**
   - *How to avoid*: Apply `statsmodels.stats.multitest.multipletests(pvals, method='fdr_bh')` or use adjusted p-values from upstream tools (DESeq2 `padj`).

3. **Bracket overlap with data or other brackets**
   - *How to avoid*: Use incremental vertical offset for stacked brackets. Start the first bracket above the maximum data value + error bar, then add a fixed offset for each additional bracket.

4. **Asterisks without stating which test was used**
   - *How to avoid*: Always include the test name in the plot title or annotation (e.g., "Mann-Whitney U test" or "Tukey HSD post-hoc").

5. **Inconsistent notation across figures**
   - *How to avoid*: Use the same `pvalue_to_asterisk()` function throughout the analysis. Define it once and reuse.

6. **Annotating "ns" on every non-significant pair**
   - *How to avoid*: Only show "ns" when the non-significance itself is a notable finding (e.g., showing no difference between two treatments). Omit ns annotations for pairs not being compared.

7. **Placing annotations below the data**
   - *How to avoid*: Always place brackets and asterisks above the compared groups, never below.

## Workflow

### Step 1: Compute Statistical Tests

Run the appropriate test and collect p-values before plotting:

```python
from scipy import stats

# Two-group comparison
stat, pval = stats.mannwhitneyu(group_a, group_b, alternative='two-sided')
# or for normal data:
stat, pval = stats.ttest_ind(group_a, group_b)

# Multi-group: ANOVA + post-hoc
from scipy.stats import f_oneway
stat, pval_anova = f_oneway(group_a, group_b, group_c)

# Post-hoc pairwise (if ANOVA significant)
from iterto
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-