Skill1.4k repo starsupdated yesterday

h-compare

h-compare implements the FPF (Fair Portfolio Framework) compare workflow for multi-dimensional variant evaluation. Use it when comparing multiple solutions across several criteria and needing to identify non-dominated options rather than selecting a single winner. The skill guides you through dimension definition, parity planning, and Pareto front computation by calling the haft kernel to characterize problem dimensions, declare scoring conditions upfront, run parallel dimension-wise scoring, compute the non-dominated set, and present trade-off results visually.

View source Repository: haft

Install in Claude Code

Copy

git clone --depth 1 https://github.com/m0n0x41d/haft /tmp/h-compare && cp -r /tmp/h-compare/internal/cli/skill/h-compare ~/.claude/skills/h-compare

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# h-compare — Fair comparison with Pareto front

You are running the FPF compare workflow: characterize dimensions → declare parity plan → declare selection policy BEFORE scoring → dim-wise parallel scoring → compute non-dominated set → present Pareto front (NOT a scalar winner).

## Step 1 — Ensure portfolio exists

If no `portfolio_ref` is given:
- Look up via `mcp__haft__haft_query(action="status")` for active portfolios
- If only one active portfolio matches the problem, ask the operator to confirm

The kernel auto-detects when only one active portfolio exists, but explicit reference is safer.

## Step 2 — Characterize dimensions (agent drafts, operator reviews)

If the portfolio's problem has no dimensions, **the agent drafts them** based on the variants and the problem signal. Do not delegate dimension authorship back to the operator — that defeats the value of the agent. Read the variants, identify the axes on which they actually differ, draft 2-5 dimensions inline, then call the kernel. Surface the drafted dimensions to the operator with "Drafted these dimensions — edit if any are wrong before scoring." Operator review is the gate, not operator authorship.

```
mcp__haft__haft_problem(
  action="characterize",
  problem_ref="<prob-...>",
  dimensions=[
    {
      "name": "latency_p95",
      "role": "target",         // constraint | target | observation
      "polarity": "lower_better",
      "scale_type": "ratio",
      "unit": "ms",
      "how_to_measure": "<single sentence>"
    },
    {
      "name": "memory_usage",
      "role": "constraint",     // hard limit — eliminates variant before scoring
      "polarity": "lower_better",
      "scale_type": "ratio",
      "unit": "MB",
      "how_to_measure": "<...>"
    },
    {
      "name": "ops_complexity",
      "role": "observation",    // Anti-Goodhart: watch but don't optimize
      "polarity": "lower_better",
      "scale_type": "ordinal",
      "how_to_measure": "<...>"
    }
  ]
)
```

Per FPF CHR-01: 1-3 targets max, plus constraints (hard limits) and observations (watch but do not optimize — Anti-Goodhart).

## Step 3 — Declare parity_plan (BEFORE scoring per FPF CMP-01)

```
parity_plan = {
  "baseline_set": ["<variant_id_1>", "<variant_id_2>", "<variant_id_3>"],
  "window": "<time/observation window scores are comparable in>",
  "budget": "<resource budget held equal across variants>",
  "missing_data_policy": "explicit_abstain | zero | exclude",
  "pinned_conditions": ["<must-hold condition>", ...]
}
```

For DEEP mode the kernel REQUIRES baseline_set, window, budget, missing_data_policy to be present. Standard mode accepts gaps with warnings.

## Step 4 — Declare selection_policy (BEFORE scoring per FPF CMP-02)

State the rule used to pick from the Pareto front BEFORE you see any scores. This is the Anti-Goodhart enforcement boundary.

Bad (post-hoc): "We picked X because it scored best on the dimensions we cared about."
Good (pre-declared): "Maximize latency_p95 subject to memory_usage < 200MB constraint; tie-break by ops_complexity."

Store the policy string for the kernel call.

## Step 5 — Score variants DIM-WISE in parallel (one Agent per dimension)

For M dimensions and N variants, spawn M Agent subagents IN THE SAME MESSAGE. Each subagent scores ALL variants on ONE dimension. This way the same evaluator applies the same scale, preventing the comparability problem you get if you instead spawned per-variant agents.

```
Agent(
  description="Score all variants on latency_p95",
  prompt="
    You are scoring dimension: latency_p95
    Unit: ms
    Polarity: lower_better
    How to measure: <from characterize step>

    Variants to score:
    1. <variant_id_1>: <description>
    2. <variant_id_2>: <description>
    3. <variant_id_3>: <description>

    Apply the SAME scoring approach to ALL variants. Use parity_plan:
    <parity_plan>

    Return EXACTLY:
    scores:
      <variant_id_1>: <numeric or ordinal value with unit>
      <variant_id_2>: <...>
      <variant_id_3>: <...>
    methodology: <one paragraph: how you measured, what you assumed,
                  any missing data treated per parity_plan policy>
    confidence: low | medium | high
  "
)
```

Spawn M of these in one message. After all return, assemble scores per variant.

## Step 6 — Call kernel with scores + Pareto computation

```
mcp__haft__haft_solution(
  action="compare",
  portfolio_ref="<sol-...>",
  dimensions=["latency_p95", "memory_usage", "ops_complexity"],
  scores={
    "<variant_id_1>": {"latency_p95": "...", "memory_usage": "...", ...},
    "<variant_id_2>": {...},
    "<variant_id_3>": {...}
  },
  parity_plan=<from Step 3>,
  policy_applied="<selection policy declared in Step 4 BEFORE scoring>",
  mode="<inherit from problem>"
)
```

The kernel computes the non-dominated set (Pareto front) from scores. Constraints eliminate variants that violate hard limits BEFORE Pareto computation.

## Step 7 — Present the Pareto front to operator

Surface:
- Non-dominated set (Pareto front) with their score profiles
- Dominated variants with explicit dominance explanation (which variants dominate them, on which dimensions)
- Pareto trade-offs: for non-dominated variants, what they each give up
- Recommendation (advisory only — the operator decides via /h-decide)
- Soft warnings from the kernel (read them — they may flag rigged comparison: missing parity, single-dimension, selected-not-in-non-dominated, etc.)

**Re-grounding discipline (FPF A.7).** Every variant label (`V1`, `V2`,
…) and artifact ID (`sol-...`, `prob-...`) in your Pareto-front summary,
dominance explanation, and recommendation paragraphs MUST be paired with
its human-readable title or one-line claim on first mention. Bare `V3
dominates V1 on latency_p95` is opaque when operator returns 30 minutes
later; `V3 (in-memory cache) dominates V1 (per-request DB read) on
latency_p95` restores the object behind the carrier. Apply consistently
to dimension labels too where they are abst