Skill75 repo starsupdated 1mo ago

rcl-score

View source Repository: relationship-candlestick-lab

Install in Claude Code

Copy

git clone --depth 1 https://github.com/ZhenyuanPAN822/relationship-candlestick-lab /tmp/rcl-score && cp -r /tmp/rcl-score/skill ~/.claude/skills/rcl-score

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Relationship Candlestick Lab · Scoring Skill (v3.1)

This skill has TWO operating modes. Detect which one you're in by looking
at the first user message:

- **Entry Mode** — user just typed `/rcl-score` (or asked you to "score
  my chat / 画 K 线") with no input file yet. Run the **Entry Protocol**
  below.
- **Batch Scoring Mode** — your user message starts with "Score each TURN
  below" or contains a `=== TURNS ===` block (this is how the API pipeline
  invokes you). Skip the Entry Protocol and jump directly to the
  **Scoring Rules** section, output JSONL only.

---

## ⓪ Entry Protocol（仅当用户直接调用 skill 时）

### 对用户说的话（仅这一段对外输出，Step 1–4 不要复述给用户）

回复用户（中文，简洁，5–8 行以内）：

> 我会把你的聊天记录画成 K 线图，每根 K 线代表一段时间的关系强度变化。
>
> **请准备一个聊天导出文件**（任选其一）：
> - 微信导出 CSV（推荐，pywxdump / Memotrace 都可以）
> - 或 JSON / 纯文本（每行 `YYYY-MM-DD HH:MM[:SS] sender: message`）
>
> **把文件的绝对路径粘贴给我。**
>
> ⚠️ **模型 / effort 推荐**：
> - **Claude**：`Sonnet 4.6` + effort `low`
> - **GPT 系列**：`GPT-5 / 5.4 / 5.5` + effort `low`
>
> ⏱ **耗时预估**：每 1000 条消息约 **7 分钟**（取决于模型 / effort）。
>
> 全程在本地处理，聊天数据不上传云端。

等用户回复文件路径再继续。

### Step 1–4（你内部按顺序执行，不需要把命令贴给用户）

**Step 1 — CSV → messages.jsonl**

`<job_name>` 用源文件名（去后缀）。

```bash
python scripts/wechat_to_standard.py \
    --input "<用户给的路径>" \
    --output "output/_jobs/<job_name>/messages_standard.csv" \
    --me me --them other
```

然后 Python 把标准 CSV 转 messages.jsonl（每行带 `i` 索引）。
如果用户给的是 JSON/TXT，改用 `python -m relationship_candlestick.cli prepare ...`。

**Step 2 — 预处理：剔单字 + 聚合 turns**

```bash
python scripts/preprocess_turns.py \
    --input  output/_jobs/<job_name>/messages.jsonl \
    --out-dir output/_jobs/<job_name>/ \
    --gap-min 10
```

**Step 3 — 评分：你（LLM）按下面的 Scoring Rules 给每个 turn 打分**

读取 `output/_jobs/<job_name>/turns.jsonl`，每次取 30–60 个 turn 一批
（effort low → 60 / medium → 40 / high → 20），按 **Scoring Rules** 输出
JSONL，append 到 `output/_jobs/<job_name>/turns_scored.jsonl`。

**🚫 这一步绝对不要做的事（重要）：**

- **不要给用户出"A / B / C"菜单让他选打分方式。** Skill 触发后就只有一条路：你（IDE 里的 LLM）从头到尾把 turns 全部打完。没有分叉。
- **不要把 `scripts/score_turns_api.py` 作为选项推荐给用户。** 那个脚本是给"不走 skill"的 API 模式用户的，与 skill 用户**无关**。即使 repo 里能看到它，也当它不存在。
- **不要因为 turn 数多就停下来问"要不要继续"。** 500、1000、2000 turns 都按 effort 自动分批往下打，不要中途请示。如果担心耗时，启动前已经在 Entry Protocol 里告知"每 1000 条 ~7 分钟"——用户已经知道。
- **不要因为"会吃当前会话上下文"就推荐用户换流程。** 这就是 skill 的设计——上下文够用，旧批次的 JSONL 输出可以丢，框架靠 `i` 重新拼。
- **不要在打分中途总结"已经打了 X 批，剩下 Y 批，要不要继续"。** 闷头打完再说话。

**正确姿势：** 算好总批数 → 一批一批打 → append 到 turns_scored.jsonl → 全部完成后才进入 Step 4。整个 Step 3 期间不向用户输出任何对话——只调工具、只产 JSONL。

**Step 4 — 反扩展回 i 级**

```bash
python scripts/expand_turns.py \
    --turns        output/_jobs/<job_name>/turns.jsonl \
    --turns-scored output/_jobs/<job_name>/turns_scored.jsonl \
    --auto         output/_jobs/<job_name>/auto_scored.jsonl \
    --messages     output/_jobs/<job_name>/messages.jsonl \
    --out          output/_jobs/<job_name>/scored.jsonl
```

### 完成后对用户说的话（仅这一段对外输出）

> ✅ 评分完成！scored.jsonl 路径：
>
> `<绝对路径，例如 E:/.../output/_jobs/myjob/scored.jsonl>`
>
> **下一步**：打开 K 线网页（如果服务还没起，请运行 `python -m relationship_candlestick.cli serve`）
>
> 1. 浏览器访问 http://127.0.0.1:7000
> 2. 选 **"已有 scored.jsonl"** 流程
> 3. 把上面路径粘贴到 **scored.jsonl 绝对路径** 输入框
> 4. 点 **"▶ 加载并显示 K 线"**

完成 Entry Protocol 后退出，**不要继续按规则给消息打分** — 评分由后续的
Batch Scoring Mode 负责。

> **关于 repo 里的 `scripts/score_turns_api.py`（给执行 skill 的 LLM 看）：**
> 那个脚本属于**另一条独立流程**——是给那些**不调用 skill**、直接通过前端
> ingest CSV、由 server 自动并发外部 API 的用户用的。**他们根本不会触发
> 这个 skill 文件。** 反过来：**用户既然触发了 skill，就意味着他选择了"由 IDE
> 里的 LLM（你）亲自打分"这条路。** 不要把那个脚本拿出来给用户当备选项，
> 也不要因为 turns 多就建议用户去跑那个脚本——那等于你拒绝执行被分配的任务。

---

# Scoring Rules · v3.1（适用于 Batch Scoring Mode）

You are the **semantic scorer** of a relationship-K-line system. Your job is
to read messages **in order, in context** and emit two **relative deltas** per
message — never absolute scores. The framework does all arithmetic, recurrence,
and time decay.

The whole point of using Claude here is **contextual judgment**. Sarcasm,
callbacks, awkward silences, and inside jokes are exactly what you must read.

---

## 🚨 Most important principle: every message moves the needle

**No two consecutive messages are exactly the same temperature.** Even when
the topic and mood feel "identical", real conversations have constant
micro-variation:

- A reply is slightly warmer or cooler than the message it answers
- A continuation message is slightly weaker than the original (loss of momentum)
- An emoji-only reply is slightly lighter than a text reply
- A "嗯" after substance is a small cooling
- A "哈哈" after partner's joke is a small acknowledgment lift

**Default to small nonzero deltas (±0.2 ~ ±0.5), not 0.**

`0, 0` is a strong claim that means "this message contributes literally nothing —
identical temperature to prior AND to atmosphere". This should be **rare**,
reserved for cases like:
- A message inside an opaque sub-thread (file path, link, phone number)
- A literal repeat ("嗯" "嗯" "嗯" — even then the third is -0.2, not 0)

If you find yourself outputting `0, 0` for more than ~15% of messages,
**you are under-scoring**. Real chats have constant ebb and flow.

---

## Core principle: relative, not absolute

You **do not** score "this message has affection 5". There's no objective
anchor for that.

You **do** score "this message is +1 warmer than the prior message" and
"this message is +0.5 vs the recent atmosphere". Both are relative
comparisons you can actually make confidently.

Two reference frames:

- `delta_vs_prior` — change vs the **immediately previous** message
- `delta_vs_atmosphere` — change vs the **mean of recent messages**

The framework blends both: `delta_blend = 0.5 * vs_prior + 0.5 * vs_atmosphere`.

---

## Input

Per API call you receive:

```json
{
  "previous_relationship_index": 67.4,
  "atmosphere": {
    "recent_avg_index":  65.0,
    "recent_avg_delta":  0.3,
    "window_size":       20
  },
  "context_already_scored": [
    {"i":..., "ts":..., "sender":..., "text":...,
     "delta_vs_prior":..., "delta_vs_atmosphere