factor-research
The factor-research skill provides a systematic framework for evaluating the predictive power of trading factors through IC/IR statistical analysis and quantile-based backtesting. It accepts factor values and forward returns across instruments and dates, then outputs information coefficient series, summary statistics, and quantile portfolio performance to determine factor validity and guide multi-factor combination strategies.
git clone --depth 1 https://github.com/HKUDS/Vibe-Trading /tmp/factor-research && cp -r /tmp/factor-research/agent/src/skills/factor-research ~/.claude/skills/factor-researchSKILL.md
# Factor Research Framework ## Purpose Systematically evaluates the predictive power of single or multiple factors. Uses IC/IR statistical tests and quantile backtests to determine whether a factor has stock-selection power, and to guide factor screening and combination. Applicable scenarios: - Single-factor validity testing (momentum, value, quality, volatility, and more) - Determining weights for multi-factor combination - Factor decay analysis (IC changes across different holding periods) - Comparing factor differences across industries and markets ## Workflow 1. **Calculate factor values**: compute factor exposures for each instrument on the cross-section, and output a factor CSV (`index=date`, `columns=codes`) 2. **Calculate returns**: compute each instrument's forward N-day return, and output a return CSV (same structure) 3. **Call the `factor_analysis` tool**: pass in the factor CSV, return CSV, and output directory 4. **Interpret the results**: judge factor validity based on IC/IR criteria and quantile backtest results 5. **Factor screening / combination**: keep effective factors and combine them with equal weights or IC-based weights **Key point**: the rows (dates) and columns (instrument codes) of the factor CSV and return CSV must align exactly. Returns must be forward returns after the factor-observation date (to avoid look-ahead bias). ## `factor_analysis` Tool Parameters | Parameter | Type | Required | Default | Description | |------|------|------|------|------| | factor_csv | string | Yes | - | Path to the factor-value CSV | | return_csv | string | Yes | - | Path to the return CSV | | output_dir | string | Yes | - | Output directory for results | | n_groups | integer | No | 5 | Number of quantile groups | ## Output Files | File | Contents | |------|------| | ic_series.csv | Daily IC series | | ic_summary.json | IC mean, IC standard deviation, IR, proportion of IC > 0 | | group_equity.csv | Cumulative equity curves for each quantile group | ## IC/IR Interpretation Standards | Metric | Threshold | Interpretation | |------|------|------| | IC mean | > 0.03 | Factor has basic predictive power | | IC mean | > 0.05 | Factor has strong predictive power | | IC mean | > 0.10 | Unusually high; check for look-ahead bias | | IR (IC mean / IC std) | > 0.5 | Factor is stably effective | | IR | > 1.0 | Extremely strong, very rare | | Proportion of IC > 0 | > 55% | Factor direction is stable | | Proportion of IC > 0 | < 50% | Factor direction is unstable and unusable | Note: negative IC can also be useful (reverse factors). Judge by absolute value, and reverse the signal direction in actual use. ## Quantile Backtest Interpretation Quantile backtesting sorts instruments into N groups by factor value from low to high (default 5 groups), with equal-weight holding inside each group. **Criteria**: - **Monotonicity**: the final net values from `Group_1` to `Group_N` should show a monotonic rising (or falling) pattern. Better monotonicity means stronger factor discrimination - **Long-short spread**: the net-value difference between the highest and lowest group (`long_short_spread`). A larger spread means stronger selection power - **Nonlinearity**: if only the top and bottom groups differ materially while the middle groups are similar, the factor may only be effective in the tails - **Stability**: group equity curves should be smooth; sharp swings indicate an unstable factor **Warning signs**: - No meaningful difference across group equity curves → the factor is ineffective - Non-monotonic pattern (such as V-shape or inverted V-shape) → the factor may have a nonlinear relationship and requires further analysis - One group's net value falls persistently → the factor may be usable in reverse ## Factor Combination Methods When multiple single factors pass validity tests, they should be combined into a composite factor: ### Equal-Weight Combination The simplest method: standardize each factor and sum them with equal weights. Suitable when the factor count is small and IC differences are minor. ``` Composite factor = Z(factor1) + Z(factor2) + ... + Z(factorN) where Z() is cross-sectional Z-score standardization ``` ### IC-Weighted Combination Assign weights according to historical IC mean. Factors with higher IC receive larger weights. ``` weight_i = |IC_mean_i| / sum(|IC_mean_j|) Composite factor = sum(weight_i * Z(factor_i)) ``` ### Orthogonalized Combination First orthogonalize the factors with the Schmidt process to remove collinearity, then combine them with equal weights. Suitable when factors are highly correlated with one another. ``` 1. Sort factors by IC from high to low 2. Keep the first factor unchanged 3. Regress each later factor on all previous factors and use the residual as the orthogonalized factor 4. Combine the orthogonalized factors with equal weights ``` ## Common Pitfalls ### Look-Ahead Bias - Factor values must be computed using data from day T and earlier, while returns must use data from T+1 to T+N - Wrong example: calculate the factor with day T closing price and correlate it with day T return → artificially inflated IC - Correct approach: factor value at day T, return defined as the move from the T close to the T+1 close and beyond ### Skewed Factor Distributions - Some factors (such as market cap and turnover) have heavily right-skewed distributions - Computing IC directly from raw values makes the result dominated by outliers - Solution: apply cross-sectional rank or Z-score standardization before computing IC ### Industry Neutralization - Factor values can be highly similar within the same industry, causing stock selection to cluster in a few sectors - Solution: perform Z-score standardization within each industry (industry neutralization) to remove industry effects - For China A-shares, Shenwan Level-1 industries can be used ### Insufficient Sample Size - Each cross-section should contain at least 5 valid instruments to compute meaningful
Professional finance research toolkit — backtesting (7 engines + benchmark comparison panel), factor analysis, Alpha Zoo (452 pre-built alphas across qlib158/alpha101/gtja191/academic), options pricing, 77 finance skills, 29 multi-agent swarm teams, Trade Journal analyzer, and Shadow Account (extract → backtest → render) across 7 data sources (tushare, yfinance, okx, akshare, mootdx, ccxt, futu).
ADR/H-share/A-share cross-listing premium analysis — track pricing gaps between US-listed ADRs, HK-listed H-shares, and A-shares for arbitrage signals, dual-listing valuation, and delisting risk assessment.
AKShare financial data aggregator (18k+ stars). Free, no API key. Covers A-shares, US, HK, futures, macro, forex. Primary fallback for tushare and yfinance.
Browse and bench the bundled alpha zoos — prebuilt cross-sectional factor libraries (Kakushadze 101, GTJA 191, Qlib 158, Fama-French / Carhart). Use when the user asks "which alphas exist", wants metadata on a named alpha, or wants to run IC/IR on a whole zoo over a universe.
A 股 ST/*ST 风险预测框架 — 基于最新中报/三季报或业绩预告/快报,预测下一财年是否会因营收、利润、净资产、分红不达标而被风险警示,并将新浪监管处罚记录作为独立证据面纳入风险等级。仅适用于 A 股,不预测财务造假。
Asset allocation theory and optimizer usage — MPT / Black-Litterman / risk budgeting / all-weather strategy, including guides for 4 optimizers and rebalancing rules.
Diagnose failed or underperforming backtests, locate the root cause, and fix the issue
Behavioral finance applications: theories of overreaction and underreaction, behavioral explanations for momentum and reversal, investor sentiment cycles, cognitive-bias checklists, and debiasing quantitative strategies.