Skip to main content
ClaudeWave
Skill12k estrellas del repoactualizado today

ml-strategy

This Claude Code skill implements a machine-learning trading strategy using scikit-learn models (RandomForest, GradientBoosting, or Ridge) with walk-forward validation to predict price direction from OHLCV data. Use it when you need to generate trading signals based on engineered features like momentum, volatility, and volume ratios while avoiding look-ahead bias through time-series cross-validation on any liquid asset's candlestick data.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/HKUDS/Vibe-Trading /tmp/ml-strategy && cp -r /tmp/ml-strategy/agent/src/skills/ml-strategy ~/.claude/skills/ml-strategy
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Machine-Learning Predictive Strategy

## Purpose

Use sklearn machine-learning models (`RandomForest` / `GradientBoosting` / `Ridge`) to predict the direction of future returns and generate trading signals. Walk-forward training is used to avoid future data leakage, and feature engineering extracts useful factors from OHLCV data.

## Signal Logic

1. **Validate input**: check OHLCV columns, minimum row count, NaN ratio — skip symbols that fail
2. **Feature engineering**: build multi-dimensional factors from raw OHLCV data (momentum, volatility, RSI, moving-average ratios, volume ratio, and more). All features are sanitized (inf removed, division-by-zero guarded)
3. **Label construction**: future N-day return > 0 is the positive class (`1`), < 0 is the negative class (`0`)
4. **Walk-forward training**: use an expanding or sliding window, train on historical data only, and roll forward day by day for prediction
5. **Signal generation**: map `predict_proba[:, 1]` to `[-1.0, 1.0]`, or use discrete signals from `predict` in `{-1, 0, 1}`. Output is guaranteed clean (no NaN, clipped to range)

## Complete SignalEngine Example

This is the recommended full pipeline. Copy and customise — safety is built in.

```python
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler


def validate_data(df: pd.DataFrame, min_rows: int = 300) -> bool:
    """Check that OHLCV data meets minimum quality for ML training.

    Args:
        df: DataFrame with DatetimeIndex.
        min_rows: Minimum number of rows required.

    Returns:
        True if data is usable.
    """
    required = {"open", "high", "low", "close", "volume"}
    if not required.issubset(df.columns):
        return False
    if len(df) < min_rows:
        return False
    if df["close"].isnull().mean() > 0.2:
        return False
    return True


def build_features(df: pd.DataFrame) -> pd.DataFrame:
    """Build a machine-learning feature matrix from OHLCV data.

    All features are guarded against division-by-zero and sanitized
    (inf replaced with NaN) so downstream code never sees inf values.

    Args:
        df: DataFrame containing open, high, low, close, and volume columns.

    Returns:
        DataFrame with feature columns prefixed by 'f_'.
    """
    c = df["close"]
    v = df["volume"]
    ret = c.pct_change()

    features = pd.DataFrame(index=df.index)
    features["f_ret_5d"] = c.pct_change(5)
    features["f_ret_20d"] = c.pct_change(20)
    features["f_vol_20d"] = ret.rolling(20).std()
    features["f_ma_ratio"] = c / c.rolling(20).mean()
    features["f_volume_ratio"] = v / v.rolling(20).mean()

    # RSI(14) — guard: loss=0 in zero-volatility periods produces inf
    delta = c.diff()
    gain = delta.clip(lower=0).rolling(14).mean()
    loss = (-delta.clip(upper=0)).rolling(14).mean()
    rs = gain / loss.replace(0, np.nan)
    features["f_rsi_14"] = 100 - (100 / (1 + rs))

    # Bollinger Band position — guard: bb_upper == bb_lower when std=0
    ma20 = c.rolling(20).mean()
    std20 = c.rolling(20).std()
    bb_upper = ma20 + 2 * std20
    bb_lower = ma20 - 2 * std20
    bb_range = (bb_upper - bb_lower).replace(0, np.nan)
    features["f_bb_position"] = (c - bb_lower) / bb_range

    # Intraday features
    features["f_high_low_ratio"] = (df["high"] - df["low"]) / c
    features["f_close_open_ratio"] = (c - df["open"]) / df["open"]
    features["f_skew_20d"] = ret.rolling(20).skew()

    # Sanitize: replace all inf with NaN (NaN handled by walk-forward)
    features = features.replace([np.inf, -np.inf], np.nan)
    return features


def walk_forward_predict(
    features: pd.DataFrame,
    labels: pd.Series,
    min_train_size: int = 252,
    retrain_freq: int = 20,
    model_type: str = "random_forest",
    window_type: str = "expanding",
    sliding_size: int = 504,
) -> pd.Series:
    """Walk-forward training and prediction to avoid future data leakage.

    Args:
        features: Feature matrix aligned with labels by row index.
        labels: Binary labels (0/1), representing the direction of future N-day returns.
        min_train_size: Minimum training-set size in trading days.
        retrain_freq: Retrain the model every N days.
        model_type: One of "random_forest" / "gradient_boosting" / "ridge".
        window_type: "expanding" uses all history; "sliding" uses a fixed lookback.
        sliding_size: Lookback window size when window_type is "sliding".

    Returns:
        Predicted signal series with range [-1.0, 1.0], no NaN values.
    """
    predictions = pd.Series(0.0, index=features.index)
    model = None
    scaler = None

    for i in range(min_train_size, len(features)):
        # Retrain every retrain_freq days
        if model is None or (i - min_train_size) % retrain_freq == 0:
            start = max(0, i - sliding_size) if window_type == "sliding" else 0
            X_train = features.iloc[start:i].values
            y_train = labels.iloc[start:i].values

            # Drop rows with NaN
            valid = ~(np.isnan(X_train).any(axis=1) | np.isnan(y_train))
            X_train = X_train[valid]
            y_train = y_train[valid]

            if len(X_train) < 50:
                continue

            # Standardization: fit only on training set
            scaler = StandardScaler()
            X_train = scaler.fit_transform(X_train)

            # Build the model
            if model_type == "random_forest":
                model = RandomForestClassifier(
                    n_estimators=100, max_depth=5, random_state=42,
                )
            elif model_type == "gradient_boosting":
                model = GradientBoostingClassifier(
                    n_estimators=100, max_depth=3, learning_rate=0.05,
                    random_state=42,
                )
            elif model_type == "ridge":
vibe-tradingSkill

Professional finance research toolkit — backtesting (7 engines + benchmark comparison panel), factor analysis, Alpha Zoo (452 pre-built alphas across qlib158/alpha101/gtja191/academic), options pricing, 77 finance skills, 29 multi-agent swarm teams, Trade Journal analyzer, and Shadow Account (extract → backtest → render) across 7 data sources (tushare, yfinance, okx, akshare, mootdx, ccxt, futu).

adr-hshareSkill

ADR/H-share/A-share cross-listing premium analysis — track pricing gaps between US-listed ADRs, HK-listed H-shares, and A-shares for arbitrage signals, dual-listing valuation, and delisting risk assessment.

akshareSkill

AKShare financial data aggregator (18k+ stars). Free, no API key. Covers A-shares, US, HK, futures, macro, forex. Primary fallback for tushare and yfinance.

alpha-zooSkill

Browse and bench the bundled alpha zoos — prebuilt cross-sectional factor libraries (Kakushadze 101, GTJA 191, Qlib 158, Fama-French / Carhart). Use when the user asks "which alphas exist", wants metadata on a named alpha, or wants to run IC/IR on a whole zoo over a universe.

ashare-pre-st-filterSkill

A 股 ST/*ST 风险预测框架 — 基于最新中报/三季报或业绩预告/快报,预测下一财年是否会因营收、利润、净资产、分红不达标而被风险警示,并将新浪监管处罚记录作为独立证据面纳入风险等级。仅适用于 A 股,不预测财务造假。

asset-allocationSkill

Asset allocation theory and optimizer usage — MPT / Black-Litterman / risk budgeting / all-weather strategy, including guides for 4 optimizers and rebalancing rules.

backtest-diagnoseSkill

Diagnose failed or underperforming backtests, locate the root cause, and fix the issue

behavioral-financeSkill

Behavioral finance applications: theories of overreaction and underreaction, behavioral explanations for momentum and reversal, investor sentiment cycles, cognitive-bias checklists, and debiasing quantitative strategies.