Skill284 repo starsupdated 4d ago

statistical-significance-annotation

This skill provides a framework and implementation guide for adding statistical significance markers (asterisk notation) to comparison plots in matplotlib and seaborn. Use it when preparing publication-ready figures that display hypothesis test results between groups, ensuring that visual claims match quantitative evidence through standardized p-value-to-asterisk conversion, proper handling of adjusted versus raw p-values, and strategic selection of which group comparisons to annotate.

View source Repository: SciAgent-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/statistical-significance-annotation && cp -r /tmp/statistical-significance-annotation/skills/data-visualization/statistical-significance-annotation ~/.claude/skills/statistical-significance-annotation

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Statistical Significance Annotation on Plots

## Overview

Statistical significance annotations (asterisk notation) are visual markers placed on comparison plots to indicate the results of hypothesis tests between groups. They consist of brackets connecting two groups and asterisk symbols denoting the p-value range. Proper annotation ensures that the visual claims in a figure match the quantitative evidence, making plots publication-ready and scientifically rigorous. This guide covers the standard conventions, when and how to annotate, and a reusable matplotlib implementation.

## Key Concepts

### Standard Asterisk Notation

The widely adopted convention maps p-value ranges to asterisk symbols:

| Symbol | P-value Range | Meaning |
|--------|--------------|---------|
| ns | p > 0.05 | Not significant |
| \* | p <= 0.05 | Significant |
| \*\* | p <= 0.01 | Highly significant |
| \*\*\* | p <= 0.001 | Very highly significant |
| \*\*\*\* | p <= 0.0001 | Extremely significant |

The conversion function:

```python
def pvalue_to_asterisk(p: float) -> str:
    """Convert a p-value to standard asterisk notation."""
    if p <= 0.0001:
        return "****"
    elif p <= 0.001:
        return "***"
    elif p <= 0.01:
        return "**"
    elif p <= 0.05:
        return "*"
    else:
        return "ns"
```

### Adjusted vs Raw P-values

- **Single comparison** (one t-test): Use raw p-value.
- **Multiple comparisons** (pairwise tests across 3+ groups, multiple genes): Use adjusted p-values (FDR/Benjamini-Hochberg or Bonferroni). Annotating with raw p-values inflates significance.
- **Pre-computed results** (DESeq2 `padj`, ANOVA post-hoc): Use the adjusted values already provided.

### Comparison Selection

Not every pair of groups needs annotation. Select comparisons that:

- Directly support the claim made in the analysis text
- Are biologically meaningful (e.g., treatment vs control, not control-A vs control-B)
- Are limited in number to keep the figure readable (typically 1-5 per panel)

## Decision Framework

```
Does the plot compare groups?
├── No (scatter, heatmap, PCA, line trend) → Do NOT annotate
└── Yes (box, violin, bar, strip)
    ├── Does the analysis claim significance? → Annotate the claimed comparisons
    ├── Exploratory (no specific claim) → Annotate vs control only, or skip
    └── Too many groups (>6 pairwise) → Annotate key comparisons only
```

| Scenario | Annotate? | Which pairs |
|----------|-----------|-------------|
| DEG box plot: treatment vs control | Yes | Treatment vs Control |
| Multi-group ANOVA with post-hoc | Yes | Significant post-hoc pairs only |
| Gene expression across 10 cell types | Selectively | vs reference cell type only |
| PCA or UMAP | No | N/A |
| Heatmap or volcano plot | No | N/A |
| Correlation scatter | No | Report r and p in text/legend |
| Exploratory bar plot, no hypothesis | Optional | vs control if applicable |

## Best Practices

1. **Match annotations to text claims**: Every asterisk on the plot must correspond to a statistical test described in the analysis. Never annotate without having computed the test.
2. **Use adjusted p-values for multiple comparisons**: When testing more than one pair, always use FDR-corrected or Bonferroni-corrected p-values. State the correction method in the figure legend.
3. **Limit annotated pairs**: Annotate only comparisons relevant to the analysis conclusion. Over-annotating clutters the figure and dilutes focus.
4. **Position brackets clearly**: Place brackets above the data range with enough vertical offset to avoid overlapping with data points, error bars, or other brackets. Stack multiple brackets with consistent spacing.
5. **State the statistical test**: Always note the test used (t-test, Mann-Whitney U, Wilcoxon, ANOVA + Tukey HSD, etc.) in the figure title, caption, or legend.
6. **Include sample sizes**: Show n per group in the axis labels (e.g., "Control (n=30)") or figure legend.
7. **Use bold titles**: Set `fontweight='bold'` on figure titles for publication readiness.

## Common Pitfalls

1. **Annotating all pairwise comparisons in a multi-group plot**
   - *How to avoid*: Select only hypothesis-driven pairs. For k groups, k*(k-1)/2 pairs quickly becomes unreadable. Show vs control or specific contrasts only.

2. **Using raw p-values when multiple comparisons were performed**
   - *How to avoid*: Apply `statsmodels.stats.multitest.multipletests(pvals, method='fdr_bh')` or use adjusted p-values from upstream tools (DESeq2 `padj`).

3. **Bracket overlap with data or other brackets**
   - *How to avoid*: Use incremental vertical offset for stacked brackets. Start the first bracket above the maximum data value + error bar, then add a fixed offset for each additional bracket.

4. **Asterisks without stating which test was used**
   - *How to avoid*: Always include the test name in the plot title or annotation (e.g., "Mann-Whitney U test" or "Tukey HSD post-hoc").

5. **Inconsistent notation across figures**
   - *How to avoid*: Use the same `pvalue_to_asterisk()` function throughout the analysis. Define it once and reuse.

6. **Annotating "ns" on every non-significant pair**
   - *How to avoid*: Only show "ns" when the non-significance itself is a notable finding (e.g., showing no difference between two treatments). Omit ns annotations for pairs not being compared.

7. **Placing annotations below the data**
   - *How to avoid*: Always place brackets and asterisks above the compared groups, never below.

## Workflow

### Step 1: Compute Statistical Tests

Run the appropriate test and collect p-values before plotting:

```python
from scipy import stats

# Two-group comparison
stat, pval = stats.mannwhitneyu(group_a, group_b, alternative='two-sided')
# or for normal data:
stat, pval = stats.ttest_ind(group_a, group_b)

# Multi-group: ANOVA + post-hoc
from scipy.stats import f_oneway
stat, pval_anova = f_oneway(group_a, group_b, group_c)

# Post-hoc pairwise (if ANOVA significant)
from iterto