Skill28.1k repo starsupdated today

correlation-analysis

The correlation-analysis skill provides a comprehensive framework for identifying co-moving asset pairs and testing their long-term equilibrium relationships. It implements four analytical modes including co-movement discovery through correlation scanning, deep return-correlation analysis, sector clustering, and realized correlation measurement, combined with cointegration testing via Engle-Granger and Johansen methods. Use this skill when constructing pairs-trading strategies, building hedged portfolios, performing risk management analysis, or discovering assets with similar factor exposures and statistically significant long-run relationships for mean-reversion trading opportunities.

View source Repository: Vibe-Trading

Install in Claude Code

Copy

git clone --depth 1 https://github.com/HKUDS/Vibe-Trading /tmp/correlation-analysis && cp -r /tmp/correlation-analysis/agent/src/skills/correlation-analysis ~/.claude/skills/correlation-analysis

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Correlation and Cointegration Analysis

## Overview

Correlation analysis is a foundational tool for pairs trading, portfolio construction, and risk management. This skill covers four analysis modes (co-movement discovery / return-correlation deep dive / sector clustering / realized correlation), a full cointegration-testing framework, cross-market linkage analysis, and the complete workflow from analytics to pair-trading signals.

---

## Mode 1: Co-Movement Discovery

**Use case**: Given a target asset, scan a universe for highly correlated assets and build a candidate pool with similar industry or factor exposure, for use in pairs trading or substitute identification.

### Workflow

```
1. Pull daily return series for the target asset and N candidates
2. Compute Pearson / Spearman correlations between the target and each candidate
3. Rank by correlation in descending order and keep Top-K (usually K=10-20)
4. Run cointegration tests on the Top-K set to retain pairs with real long-run equilibrium
5. Output the candidate pool and a correlation summary
```

```python
import pandas as pd
import numpy as np
from scipy.stats import pearsonr, spearmanr

def scan_correlated_assets(
    target_returns: pd.Series,
    universe_returns: pd.DataFrame,
    top_k: int = 20,
    min_corr: float = 0.5,
    method: str = "pearson",
) -> pd.DataFrame:
    """Scan for assets that are highly correlated with the target asset.

    Args:
        target_returns: Daily return series for the target asset
        universe_returns: Candidate-universe return matrix, columns are symbols
        top_k: Number of top candidates to return
        min_corr: Minimum absolute-correlation threshold
        method: "pearson" or "spearman"

    Returns:
        A DataFrame containing symbol / corr / p_value / rank
    """
    aligned = universe_returns.dropna(axis=1, how="any")
    aligned, target_aligned = aligned.align(target_returns, join="inner", axis=0)

    results = []
    for col in aligned.columns:
        if method == "spearman":
            corr, p = spearmanr(target_aligned, aligned[col])
        else:
            corr, p = pearsonr(target_aligned, aligned[col])
        results.append({"symbol": col, "corr": corr, "p_value": p})

    df = pd.DataFrame(results)
    df = df[df["corr"].abs() >= min_corr].sort_values("corr", ascending=False)
    df["rank"] = range(1, len(df) + 1)
    return df.head(top_k).reset_index(drop=True)
```

**Screening guidance**:

| Correlation | Conclusion | Follow-up Action |
|---------|------|---------|
| > 0.8 | Strong same-direction co-movement | Send to the cointegration test queue |
| 0.6 - 0.8 | Moderate co-movement | Check industry / factor alignment before cointegration |
| < 0.6 | Weak correlation | Usually unsuitable for pairs trading |
| Negative and < -0.6 | Strong inverse co-movement | Can be used in hedged portfolios, but be careful with spread direction |

---

## Mode 2: Deep Return-Correlation Analysis

**Use case**: Run a full bivariate correlation study on two assets, including multiple correlation coefficients, Beta / R², rolling correlation, and spread Z-Score.

### Core Metrics

```python
import statsmodels.api as sm
from scipy.stats import pearsonr, spearmanr, kendalltau

def bivariate_correlation_analysis(
    y: pd.Series,
    x: pd.Series,
    rolling_window: int = 60,
) -> dict:
    """Run deep correlation analysis for two assets.

    Args:
        y: Daily return series of asset A
        x: Daily return series of asset B
        rolling_window: Rolling-window length in trading days

    Returns:
        Dict of correlation statistics
    """
    # Align the two series.
    df = pd.concat([y.rename("y"), x.rename("x")], axis=1).dropna()
    y_clean, x_clean = df["y"], df["x"]

    # Static correlations.
    pearson_r, pearson_p = pearsonr(y_clean, x_clean)
    spearman_r, spearman_p = spearmanr(y_clean, x_clean)
    kendall_r, kendall_p = kendalltau(y_clean, x_clean)

    # OLS: y = α + β·x
    x_const = sm.add_constant(x_clean)
    ols = sm.OLS(y_clean, x_const).fit()
    beta = ols.params["x"]
    alpha = ols.params["const"]
    r_squared = ols.rsquared

    # Rolling Pearson correlation.
    rolling_corr = y_clean.rolling(rolling_window).corr(x_clean)

    # Spread and Z-Score using the hedge ratio.
    spread = y_clean - beta * x_clean
    spread_mean = spread.rolling(rolling_window).mean()
    spread_std = spread.rolling(rolling_window).std()
    z_score = (spread - spread_mean) / spread_std

    return {
        "pearson": {"r": round(pearson_r, 4), "p": round(pearson_p, 6)},
        "spearman": {"r": round(spearman_r, 4), "p": round(spearman_p, 6)},
        "kendall": {"r": round(kendall_r, 4), "p": round(kendall_p, 6)},
        "beta": round(beta, 4),
        "alpha": round(alpha, 6),
        "r_squared": round(r_squared, 4),
        "rolling_corr": rolling_corr,
        "spread": spread,
        "z_score": z_score,
        "spread_mean": spread_mean,
        "spread_std": spread_std,
    }
```

### Correlation-Coefficient Selection Guide

| Coefficient | Assumption | Best Use Case | Not Suitable When |
|------|------|---------|--------|
| Pearson | Linear, approximately normal | Return series | Heavy tails / many outliers |
| Spearman | Monotonic relationship | Ranking / quantile analysis, many outliers | When magnitude information matters |
| Kendall | Order consistency | Small samples, unknown distribution | Large samples due to slower computation |

**Practical rule in finance**: Usually report all three coefficients. If Pearson and Spearman differ by more than 0.1, the relationship is likely nonlinear or heavy-tailed, and Spearman should carry more weight.

---

## Mode 3: Sector Clustering

**Use case**: Run hierarchical clustering on the correlation matrix of N assets to discover sector structure, check portfolio diversification, and identify similar assets.

```python
import numpy as np
import pandas as pd
from scipy.cluster.hierarchy impo

More from this repository

vibe-tradingSkill

Professional finance research toolkit — backtesting (7 engines + benchmark comparison panel), factor analysis, Alpha Zoo (452 pre-built alphas across qlib158/alpha101/gtja191/academic), options pricing, 79 finance skills, 29 multi-agent swarm teams, Trade Journal analyzer, and Shadow Account (extract → backtest → render) across 18 market-data sources (tushare, yfinance, okx, akshare, baostock, tencent, mootdx, ccxt, futu, local, eastmoney, sina, stooq, yahoo, plus optional-key finnhub/alphavantage/tiingo/fmp).

adr-hshareSkill

ADR/H-share/A-share cross-listing premium analysis — track pricing gaps between US-listed ADRs, HK-listed H-shares, and A-shares for arbitrage signals, dual-listing valuation, and delisting risk assessment.

akshareSkill

AKShare financial data aggregator (18k+ stars). Free, no API key. Covers A-shares, US, HK, futures, macro, forex. Primary fallback for tushare and yfinance.

alpha-zooSkill

Browse and bench the bundled alpha zoos — prebuilt cross-sectional factor libraries (Kakushadze 101, GTJA 191, Qlib 158, Fama-French / Carhart). Use when the user asks "which alphas exist", wants metadata on a named alpha, or wants to run IC/IR on a whole zoo over a universe.

ashare-pre-st-filterSkill

A 股 ST/*ST 风险预测框架 — 基于最新中报/三季报或业绩预告/快报，预测下一财年是否会因营收、利润、净资产、分红不达标而被风险警示，并将新浪监管处罚记录作为独立证据面纳入风险等级。仅适用于 A 股，不预测财务造假。

asset-allocationSkill

Asset allocation theory and optimizer usage — MPT / Black-Litterman / risk budgeting / all-weather strategy, including guides for 4 optimizers and rebalancing rules.

backtest-diagnoseSkill

Diagnose failed or underperforming backtests, locate the root cause, and fix the issue

behavioral-financeSkill

Behavioral finance applications: theories of overreaction and underreaction, behavioral explanations for momentum and reversal, investor sentiment cycles, cognitive-bias checklists, and debiasing quantitative strategies.