Skip to main content
ClaudeWave
Skill3.4k repo starsupdated 3mo ago

pipeline

The pipeline skill executes end-to-end document processing in four chained phases: seeding the source file into the archive system, extracting claims via the reduce operation, processing all claims through reflection, reweaving, and verification steps using the RALPH subagent framework, and finally archiving task files with a summary report. Use this command when you need to process a complete source document from intake through verification in a single operation, triggered by "/pipeline" or "process this end to end".

Install in Claude Code
Copy
git clone --depth 1 https://github.com/agenticnotetaking/arscontexta /tmp/pipeline && cp -r /tmp/pipeline/skill-sources/pipeline ~/.claude/skills/pipeline
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

## EXECUTE NOW

**Target: $ARGUMENTS**

Parse immediately:
- Source file path: the file to process (required)
- `--handoff`: output RALPH HANDOFF block at end (for chaining)
- If target is empty: list files in {DOMAIN:inbox}/ and ask which to process

### Step 0: Read Vocabulary

Read `ops/derivation-manifest.md` (or fall back to `ops/derivation.md`) for domain vocabulary mapping. All output must use domain-native terms. If neither file exists, use universal terms.

**START NOW.** Run the full pipeline.

---

## Pipeline Overview

The pipeline chains four phases. Each phase uses skill invocation or /ralph for subagent-based processing. State lives in the queue file — the pipeline is stateless orchestration on top of stateful queue entries.

```
Source file
    |
    v
Phase 1: /seed — create extract task, move source to archive
    |
    v
Phase 2: /reduce (via /ralph) — extract claims from source
    |
    v
Phase 3: /ralph (all claims) — create -> reflect -> reweave -> verify
    |
    v
Phase 4: /archive-batch — move task files, generate summary
    |
    v
Complete
```

The pipeline is the convenience wrapper. /ralph is the engine. /seed is the entry point.

---

## Phase 1: Seed

Invoke /seed on the target file to create the extract task, check for duplicates, and move the source to its archive folder.

**How to invoke:**

Use the Skill tool if available, otherwise execute the /seed workflow directly:
- Validate source exists
- Check for prior processing (duplicate detection)
- Create archive folder
- Move source from {DOMAIN:inbox} to archive
- Create extract task file
- Add extract task to queue

**Capture from seed output:**
- **Batch ID**: the source basename (used for --batch filtering in subsequent steps)
- **Archive folder path**: where the source was moved
- **next_claim_start**: the claim numbering start

Report: `$ Seeded: {source-name}`

**If seed reports the file was already processed:** Ask the user whether to proceed or skip. Do NOT auto-skip — the user may want to re-process with different scope.

---

## Phase 2: Extract (Reduce)

Process the extract task via /ralph. This spawns a subagent that runs /reduce, extracting claims from the source and creating task entries in the queue.

**How to invoke:**

```
/ralph 1 --batch {batch_id} --type extract
```

Or via Task tool:
```
Task(
  prompt = "Run /ralph 1 --batch {batch_id} --type extract",
  description = "extract: {batch_id}"
)
```

After completion, read the queue to count extracted claims and enrichments:

Check how many pending tasks exist for this batch. The reduce phase creates 1 queue entry per claim and 1 per enrichment.

Report:
```
$ Extracted: {N} {DOMAIN:note_plural}, {M} enrichments
  Processing {total_tasks} tasks through the pipeline...
```

**If zero claims extracted:** Report the issue. For TFT sources, zero extraction is a bug — the source almost certainly contains extractable content. Ask the user whether to retry with different scope or skip.

---

## Phase 3: Process All Claims

Count total pending tasks for this batch from the queue. Then process all of them through the full phase sequence.

**How to invoke:**

```
/ralph {remaining_count} --batch {batch_id}
```

Or via Task tool:
```
Task(
  prompt = "Run /ralph {remaining_count} --batch {batch_id}",
  description = "process: {batch_id} ({remaining_count} tasks)"
)
```

This processes every claim through: create -> reflect -> reweave -> verify. And every enrichment through: enrich -> reflect -> reweave -> verify.

Each phase runs in an isolated subagent with fresh context. /ralph handles all the orchestration: subagent spawning, handoff parsing, queue advancement, learnings capture.

**Progress reporting:**

The /ralph invocation reports progress per task. The pipeline relays this:
```
$ Processing {DOMAIN:note} 1/{total}: {title}
  $ create... done
  $ reflect... done (3 connections found)
  $ reweave... done (2 {DOMAIN:note_plural} updated)
  $ verify... done (PASS)
```

**For large batches (20+ claims):** /ralph handles context isolation automatically via subagents. The pipeline does NOT need to chunk — /ralph processes N tasks sequentially with fresh context per phase.

---

## Phase 4: Verify Completion

After /ralph finishes, verify all tasks for this batch are done.

Check the queue: count tasks for this batch that are NOT done.

**If tasks remain pending:**
- Report which tasks are incomplete and at which phase
- Show the specific task IDs and their current_phase
- Suggest: "Run `/ralph --batch {batch_id}` to continue from where it stopped"
- Do NOT proceed to archive

**If all tasks are done:** Proceed to Phase 5.

---

## Phase 5: Archive Batch

When all tasks for the batch are complete, archive the batch.

**How to invoke:**

```
/archive-batch {batch_id}
```

Or execute directly:
1. Move all task files from `ops/queue/` to `ops/queue/archive/{date}-{batch_id}/`
2. Generate a batch summary file: `{batch_id}-summary.md`
3. Remove completed entries from the queue (or mark as archived)

The summary should include:
- Source file name and original location
- Number of claims extracted
- Number of enrichments
- List of created {DOMAIN:note_plural} with titles
- Any notable learnings from the batch

---

## Phase 6: Final Report

```
--=={ pipeline }==--

Source: {source_file}
Batch: {batch_id}

Extraction:
  {DOMAIN:note_plural} extracted: {N}
  Enrichments identified: {M}

Processing:
  {DOMAIN:note_plural} created: {N}
  Existing {DOMAIN:note_plural} enriched: {M}
  Connections added: {C}
  {DOMAIN:topic map}s updated: {T}
  Older {DOMAIN:note_plural} updated via reweave: {R}

Quality:
  All verify checks: {PASS/FAIL count}

Archive: ops/queue/archive/{date}-{batch_id}/
Summary: {batch_id}-summary.md

{DOMAIN:note_plural} created:
- [[claim title 1]]
- [[claim title 2]]
- ...
```

If `--handoff` flag was set, also output:

```
=== RALPH HANDOFF: pipeline ===
Target: {source_file}

Work Done:
- Seeded source: {batch_id}
- Extracted {N} {
knowledge-guideSubagent

Proactive methodology guidance agent. Monitors note creation and provides real-time quality advice. Suggests connections, flags quality issues, recommends MOC updates. Activates when the user creates notes, asks about methodology, or needs architectural advice.

graphSkill

Interactive knowledge graph analysis. Routes natural language questions to graph scripts, interprets results in domain vocabulary, and suggests concrete actions. Triggers on "/graph", "/graph health", "/graph triangles", "find synthesis opportunities", "graph analysis".

learnSkill

Research a topic and grow your knowledge graph. Uses Exa deep researcher, web search, or basic search to investigate topics, files results with full provenance, and chains to processing pipeline. Triggers on "/learn", "/learn [topic]", "research this", "find out about".

nextSkill

Surface the most valuable next action by combining task stack, queue state, inbox pressure, health, and goals. Recommends one specific action with rationale. Triggers on "/next", "what should I do", "what's next".

ralphSkill

Queue processing with fresh context per phase. Processes N tasks from the queue, spawning isolated subagents to prevent context contamination. Supports serial, parallel, batch filter, and dry run modes. Triggers on "/ralph", "/ralph N", "process queue", "run pipeline tasks".

reduceSkill

Extract structured knowledge from source material. Comprehensive extraction is the default — every insight that serves the domain gets extracted. For domain-relevant sources, skip rate must be below 10%. Zero extraction from a domain-relevant source is a BUG. Triggers on "/reduce", "/reduce [file]", "extract insights", "mine this", "process this".

refactorSkill

Plan vault restructuring from config changes. Compares config.yaml against derivation.md, identifies dimension shifts, shows restructuring plan, executes on approval. Triggers on "/refactor", "restructure vault".

reflectSkill

Find connections between notes and update MOCs. Requires semantic judgment to identify genuine relationships. Use after /reduce creates notes, when exploring connections, or when a topic needs synthesis. Triggers on "/reflect", "/reflect [note]", "find connections", "update MOCs", "connect these notes".