data-stats-analysis
The data-stats-analysis skill enables statistical hypothesis testing including t-tests, ANOVA, correlation analysis, and multiple testing corrections using scipy and statsmodels libraries that execute locally. Use this skill when comparing group means, testing variable correlations, performing hypothesis testing with p-value calculations, applying correction methods like FDR or Bonferroni, or conducting non-parametric tests across any LLM provider.
git clone --depth 1 https://github.com/beita6969/ScienceClaw /tmp/data-stats-analysis && cp -r /tmp/data-stats-analysis/skills/data-stats-analysis ~/.claude/skills/data-stats-analysisSKILL.md
# Statistical Analysis (Universal)
## Overview
This skill enables you to perform rigorous statistical analyses including t-tests, ANOVA, correlation analysis, hypothesis testing, and multiple testing corrections. Unlike cloud-hosted solutions, this skill uses standard Python statistical libraries (**scipy**, **statsmodels**, **numpy**) and executes **locally** in your environment, making it compatible with **ALL LLM providers** including GPT, Gemini, Claude, DeepSeek, and Qwen.
## When to Use This Skill
- Compare means between groups (t-tests, ANOVA)
- Test for correlations between variables
- Perform hypothesis testing with p-value calculation
- Apply multiple testing corrections (FDR, Bonferroni)
- Calculate statistical summaries and confidence intervals
- Test for normality and distribution fitting
- Perform non-parametric tests (Mann-Whitney, Kruskal-Wallis)
## How to Use
### Step 1: Import Required Libraries
```python
import numpy as np
import pandas as pd
from scipy import stats
from scipy.stats import ttest_ind, mannwhitneyu, pearsonr, spearmanr
from scipy.stats import f_oneway, kruskal, chi2_contingency
from statsmodels.stats.multitest import multipletests
from statsmodels.stats.proportion import proportions_ztest
import warnings
warnings.filterwarnings('ignore')
```
### Step 2: Two-Sample t-Test
```python
# Compare means between two groups
# group1, group2: arrays of numeric values
# Perform independent t-test
t_statistic, p_value = ttest_ind(group1, group2)
print(f"t-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.4e}")
if p_value < 0.05:
print("✅ Significant difference between groups (p < 0.05)")
else:
print("❌ No significant difference (p >= 0.05)")
# With equal variance assumption check
# Levene's test for equal variances
_, levene_p = stats.levene(group1, group2)
if levene_p < 0.05:
# Use Welch's t-test (unequal variances)
t_stat, p_val = ttest_ind(group1, group2, equal_var=False)
print(f"Welch's t-test p-value: {p_val:.4e}")
else:
print("Equal variances assumed")
```
### Step 3: One-Way ANOVA
```python
# Compare means across multiple groups
# groups: list of arrays, e.g., [group1, group2, group3]
# Perform one-way ANOVA
f_statistic, p_value = f_oneway(*groups)
print(f"F-statistic: {f_statistic:.4f}")
print(f"p-value: {p_value:.4e}")
if p_value < 0.05:
print("✅ Significant difference between groups (p < 0.05)")
print("Note: Use post-hoc tests to identify which groups differ")
else:
print("❌ No significant difference between groups")
# Post-hoc pairwise t-tests with Bonferroni correction
from itertools import combinations
group_names = ['Group A', 'Group B', 'Group C']
pairwise_results = []
for (name1, data1), (name2, data2) in combinations(zip(group_names, groups), 2):
_, p = ttest_ind(data1, data2)
pairwise_results.append({
'comparison': f'{name1} vs {name2}',
'p_value': p
})
# Apply Bonferroni correction
pairwise_df = pd.DataFrame(pairwise_results)
n_tests = len(pairwise_df)
pairwise_df['p_adjusted'] = pairwise_df['p_value'] * n_tests
pairwise_df['p_adjusted'] = pairwise_df['p_adjusted'].clip(upper=1.0)
print("\nPairwise Comparisons (Bonferroni-corrected):")
print(pairwise_df)
```
### Step 4: Correlation Analysis
```python
# Pearson correlation (linear relationships)
r_pearson, p_pearson = pearsonr(variable1, variable2)
print(f"Pearson correlation: r = {r_pearson:.4f}, p = {p_pearson:.4e}")
# Spearman correlation (monotonic relationships, robust to outliers)
r_spearman, p_spearman = spearmanr(variable1, variable2)
print(f"Spearman correlation: ρ = {r_spearman:.4f}, p = {p_spearman:.4e}")
# Interpretation
if abs(r_pearson) < 0.3:
strength = "weak"
elif abs(r_pearson) < 0.7:
strength = "moderate"
else:
strength = "strong"
direction = "positive" if r_pearson > 0 else "negative"
print(f"Interpretation: {strength} {direction} correlation")
if p_pearson < 0.05:
print("✅ Statistically significant (p < 0.05)")
else:
print("❌ Not statistically significant")
```
### Step 5: Multiple Testing Correction
```python
# Scenario: Testing 1000 genes for differential expression
# p_values: array of p-values from individual tests
# Method 1: Benjamini-Hochberg FDR correction (recommended)
reject_fdr, p_adjusted_fdr, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')
# Method 2: Bonferroni correction (more conservative)
reject_bonf, p_adjusted_bonf, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')
# Create results DataFrame
results_df = pd.DataFrame({
'gene': gene_names,
'p_value': p_values,
'q_value_fdr': p_adjusted_fdr,
'p_adjusted_bonferroni': p_adjusted_bonf,
'significant_fdr': reject_fdr,
'significant_bonf': reject_bonf
})
# Summary
print(f"Original significant (p < 0.05): {(p_values < 0.05).sum()}")
print(f"Significant after FDR correction: {reject_fdr.sum()}")
print(f"Significant after Bonferroni correction: {reject_bonf.sum()}")
# Save results
results_df.to_csv('statistical_results.csv', index=False)
print("✅ Results saved to: statistical_results.csv")
```
### Step 6: Non-Parametric Tests
```python
# Use when data is not normally distributed
# Mann-Whitney U test (alternative to t-test)
u_statistic, p_value_mw = mannwhitneyu(group1, group2, alternative='two-sided')
print(f"Mann-Whitney U test:")
print(f"U-statistic: {u_statistic:.4f}")
print(f"p-value: {p_value_mw:.4e}")
# Kruskal-Wallis H test (alternative to ANOVA)
h_statistic, p_value_kw = kruskal(*groups)
print(f"\nKruskal-Wallis H test:")
print(f"H-statistic: {h_statistic:.4f}")
print(f"p-value: {p_value_kw:.4e}")
```
## Advanced Features
### Normality Testing
```python
from scipy.stats import shapiro, normaltest, kstest
# Test if data follows normal distribution
# Shapiro-Wilk test (best for n < 5000)
stat_sw, p_sw = shapiro(data)
print(f"Shapiro-Wilk test: W={stat_sw:.4f}, p={p_sw:.4e}")
# D'Agostino-Pearson test
stat_dp, p_dp = normaltest(data)
pRoute plain-language requests for Pi, Claude Code, Codex, OpenCode, Gemini CLI, or ACP harness work into either OpenClaw ACP runtime sessions or direct acpx-driven sessions ("telephone game" flow). For coding-agent thread requests, read this skill first, then use only `sessions_spawn` for thread creation.
Use the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
|
|
|
|
OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.