correlation-analysis
The correlation-analysis skill provides a comprehensive framework for identifying co-moving asset pairs and testing their long-term equilibrium relationships. It implements four analytical modes including co-movement discovery through correlation scanning, deep return-correlation analysis, sector clustering, and realized correlation measurement, combined with cointegration testing via Engle-Granger and Johansen methods. Use this skill when constructing pairs-trading strategies, building hedged portfolios, performing risk management analysis, or discovering assets with similar factor exposures and statistically significant long-run relationships for mean-reversion trading opportunities.
git clone --depth 1 https://github.com/HKUDS/Vibe-Trading /tmp/correlation-analysis && cp -r /tmp/correlation-analysis/agent/src/skills/correlation-analysis ~/.claude/skills/correlation-analysisSKILL.md
# Correlation and Cointegration Analysis
## Overview
Correlation analysis is a foundational tool for pairs trading, portfolio construction, and risk management. This skill covers four analysis modes (co-movement discovery / return-correlation deep dive / sector clustering / realized correlation), a full cointegration-testing framework, cross-market linkage analysis, and the complete workflow from analytics to pair-trading signals.
---
## Mode 1: Co-Movement Discovery
**Use case**: Given a target asset, scan a universe for highly correlated assets and build a candidate pool with similar industry or factor exposure, for use in pairs trading or substitute identification.
### Workflow
```
1. Pull daily return series for the target asset and N candidates
2. Compute Pearson / Spearman correlations between the target and each candidate
3. Rank by correlation in descending order and keep Top-K (usually K=10-20)
4. Run cointegration tests on the Top-K set to retain pairs with real long-run equilibrium
5. Output the candidate pool and a correlation summary
```
```python
import pandas as pd
import numpy as np
from scipy.stats import pearsonr, spearmanr
def scan_correlated_assets(
target_returns: pd.Series,
universe_returns: pd.DataFrame,
top_k: int = 20,
min_corr: float = 0.5,
method: str = "pearson",
) -> pd.DataFrame:
"""Scan for assets that are highly correlated with the target asset.
Args:
target_returns: Daily return series for the target asset
universe_returns: Candidate-universe return matrix, columns are symbols
top_k: Number of top candidates to return
min_corr: Minimum absolute-correlation threshold
method: "pearson" or "spearman"
Returns:
A DataFrame containing symbol / corr / p_value / rank
"""
aligned = universe_returns.dropna(axis=1, how="any")
aligned, target_aligned = aligned.align(target_returns, join="inner", axis=0)
results = []
for col in aligned.columns:
if method == "spearman":
corr, p = spearmanr(target_aligned, aligned[col])
else:
corr, p = pearsonr(target_aligned, aligned[col])
results.append({"symbol": col, "corr": corr, "p_value": p})
df = pd.DataFrame(results)
df = df[df["corr"].abs() >= min_corr].sort_values("corr", ascending=False)
df["rank"] = range(1, len(df) + 1)
return df.head(top_k).reset_index(drop=True)
```
**Screening guidance**:
| Correlation | Conclusion | Follow-up Action |
|---------|------|---------|
| > 0.8 | Strong same-direction co-movement | Send to the cointegration test queue |
| 0.6 - 0.8 | Moderate co-movement | Check industry / factor alignment before cointegration |
| < 0.6 | Weak correlation | Usually unsuitable for pairs trading |
| Negative and < -0.6 | Strong inverse co-movement | Can be used in hedged portfolios, but be careful with spread direction |
---
## Mode 2: Deep Return-Correlation Analysis
**Use case**: Run a full bivariate correlation study on two assets, including multiple correlation coefficients, Beta / R², rolling correlation, and spread Z-Score.
### Core Metrics
```python
import statsmodels.api as sm
from scipy.stats import pearsonr, spearmanr, kendalltau
def bivariate_correlation_analysis(
y: pd.Series,
x: pd.Series,
rolling_window: int = 60,
) -> dict:
"""Run deep correlation analysis for two assets.
Args:
y: Daily return series of asset A
x: Daily return series of asset B
rolling_window: Rolling-window length in trading days
Returns:
Dict of correlation statistics
"""
# Align the two series.
df = pd.concat([y.rename("y"), x.rename("x")], axis=1).dropna()
y_clean, x_clean = df["y"], df["x"]
# Static correlations.
pearson_r, pearson_p = pearsonr(y_clean, x_clean)
spearman_r, spearman_p = spearmanr(y_clean, x_clean)
kendall_r, kendall_p = kendalltau(y_clean, x_clean)
# OLS: y = α + β·x
x_const = sm.add_constant(x_clean)
ols = sm.OLS(y_clean, x_const).fit()
beta = ols.params["x"]
alpha = ols.params["const"]
r_squared = ols.rsquared
# Rolling Pearson correlation.
rolling_corr = y_clean.rolling(rolling_window).corr(x_clean)
# Spread and Z-Score using the hedge ratio.
spread = y_clean - beta * x_clean
spread_mean = spread.rolling(rolling_window).mean()
spread_std = spread.rolling(rolling_window).std()
z_score = (spread - spread_mean) / spread_std
return {
"pearson": {"r": round(pearson_r, 4), "p": round(pearson_p, 6)},
"spearman": {"r": round(spearman_r, 4), "p": round(spearman_p, 6)},
"kendall": {"r": round(kendall_r, 4), "p": round(kendall_p, 6)},
"beta": round(beta, 4),
"alpha": round(alpha, 6),
"r_squared": round(r_squared, 4),
"rolling_corr": rolling_corr,
"spread": spread,
"z_score": z_score,
"spread_mean": spread_mean,
"spread_std": spread_std,
}
```
### Correlation-Coefficient Selection Guide
| Coefficient | Assumption | Best Use Case | Not Suitable When |
|------|------|---------|--------|
| Pearson | Linear, approximately normal | Return series | Heavy tails / many outliers |
| Spearman | Monotonic relationship | Ranking / quantile analysis, many outliers | When magnitude information matters |
| Kendall | Order consistency | Small samples, unknown distribution | Large samples due to slower computation |
**Practical rule in finance**: Usually report all three coefficients. If Pearson and Spearman differ by more than 0.1, the relationship is likely nonlinear or heavy-tailed, and Spearman should carry more weight.
---
## Mode 3: Sector Clustering
**Use case**: Run hierarchical clustering on the correlation matrix of N assets to discover sector structure, check portfolio diversification, and identify similar assets.
```python
import numpy as np
import pandas as pd
from scipy.cluster.hierarchy impoProfessional finance research toolkit — backtesting (7 engines + benchmark comparison panel), factor analysis, Alpha Zoo (452 pre-built alphas across qlib158/alpha101/gtja191/academic), options pricing, 77 finance skills, 29 multi-agent swarm teams, Trade Journal analyzer, and Shadow Account (extract → backtest → render) across 7 data sources (tushare, yfinance, okx, akshare, mootdx, ccxt, futu).
ADR/H-share/A-share cross-listing premium analysis — track pricing gaps between US-listed ADRs, HK-listed H-shares, and A-shares for arbitrage signals, dual-listing valuation, and delisting risk assessment.
AKShare financial data aggregator (18k+ stars). Free, no API key. Covers A-shares, US, HK, futures, macro, forex. Primary fallback for tushare and yfinance.
Browse and bench the bundled alpha zoos — prebuilt cross-sectional factor libraries (Kakushadze 101, GTJA 191, Qlib 158, Fama-French / Carhart). Use when the user asks "which alphas exist", wants metadata on a named alpha, or wants to run IC/IR on a whole zoo over a universe.
A 股 ST/*ST 风险预测框架 — 基于最新中报/三季报或业绩预告/快报,预测下一财年是否会因营收、利润、净资产、分红不达标而被风险警示,并将新浪监管处罚记录作为独立证据面纳入风险等级。仅适用于 A 股,不预测财务造假。
Asset allocation theory and optimizer usage — MPT / Black-Litterman / risk budgeting / all-weather strategy, including guides for 4 optimizers and rebalancing rules.
Diagnose failed or underperforming backtests, locate the root cause, and fix the issue
Behavioral finance applications: theories of overreaction and underreaction, behavioral explanations for momentum and reversal, investor sentiment cycles, cognitive-bias checklists, and debiasing quantitative strategies.