Skill94 repo starsupdated 15d ago

kb-import

kb-import is a Claude Code workflow that systematically converts existing documents into structured knowledge base entries. Use it when adding documentation, converting unstructured content into categorized KB entries, or bulk-importing multiple files into a new knowledge base by analyzing source documents, proposing extraction plans, and generating properly formatted markdown files with YAML frontmatter that preserve specific details and maintain quotable, citable statements.

View source Repository: ai-first-toolkit

Install in Claude Code

Copy

git clone --depth 1 https://github.com/techwolf-ai/ai-first-toolkit /tmp/kb-import && cp -r /tmp/kb-import/plugins/knowledge-base/skills/kb-import ~/.claude/skills/kb-import

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# KB Import Workflow

Import knowledge from existing documents into your knowledge base.

## When to Use

- Adding knowledge from existing documentation
- Converting unstructured docs into structured KB entries
- Bulk-importing content into a new KB

## Modes

- **Single-document mode** (default): one source document is split into one or more KB entries. Use Steps 1 to 6 below.
- **Bulk mode**: many source documents are ingested at once from a directory or a list of files. Use when the user points at a folder or provides a list longer than ~3 files. See [Bulk Mode](#bulk-mode) at the bottom.

## Step 1: Understand the KB Structure

Read the KB config to understand available categories:
```
kb/.kb-config.yaml
```

Read the index to see what already exists:
```
kb/index.md
```

## Step 2: Read the Source Document

Read the source file provided by the user. Supported formats:
- Markdown (.md)
- PDF (.pdf, use the Read tool with page ranges for large files)
- Plain text (.txt)

## Step 3: Plan the Extraction

Analyze the document and propose a plan to the user:

1. How many KB entries should be created?
2. What categories do they belong to?
3. Suggested titles for each entry

Present this as a table:
```
| # | Title | Category | Source Section |
|---|-------|----------|---------------|
| 1 | ... | ... | ... |
```

Wait for user confirmation before proceeding.

## Step 4: Create KB Entries

For each planned entry, create a markdown file with YAML frontmatter:

```markdown
---
title: "Entry Title"
description: "Brief one-liner for index lookup"
category: {category}
tags: [{tag1}, {tag2}]
sources: ["{source_filename}"]
last_updated: "{today's date}"
related:
  - {category}/{related-file}.md
---

## Section Title

Content here. Write clear, quotable statements.
Each fact should be a self-contained sentence that can be cited as evidence.
```

### Content Guidelines

- **Preserve specifics**: Keep exact numbers, dates, names, versions. Keep concrete customer/product examples by name (e.g., "Acme Corp", "Globex") — they make abstract concepts tangible and shouldn't be stripped "for neutrality".
- **One topic per entry**: Don't create catch-all files
- **Quotable statements**: Write so that individual sentences can be cited as evidence
- **Capture the easily-missed content types** when the source covers them: stakeholders (one entry per key person with role + ownership + contact pattern), projects (goal/owner/status), repositories (purpose/ownership). These are the most commonly skipped in first-pass imports.
- **No opinions or speculation**: Only include facts from the source document
- **Use markdown structure**: Headers, bullet points, tables for structured data

### File Naming

- Use lowercase with hyphens: `data-encryption.md`, `product-overview.md`
- Name should reflect the topic, not the source document

## Step 5: Update the Index and Validate

After creating entries, regenerate the index and validate:
```bash
python3 scripts/kb-index.py --write   # rewrite kb/index.md's "All Files by Category"
python3 scripts/kb-validate.py        # check frontmatter, categories, related links
```

Review the stdout output to verify all new entries appear correctly. Resolve any validate errors before continuing.

## Step 6: Summary

Report to the user:
- How many entries were created
- Which categories they were placed in
- Any information from the source document that was skipped (and why)
- Suggestion to review entries and add `related:` links between them

## Bulk Mode

Use this when the user wants to ingest many documents in one go (e.g., "import everything in `~/docs/policies/`", or a list of 5+ files).

### Bulk Step 1: Enumerate the source set

- If the user provided a directory, list supported files in it recursively (`.md`, `.pdf`, `.txt`, `.docx`). Skip obvious noise (`.DS_Store`, `node_modules`, hidden files).
- If the user provided a list of paths, use exactly those.
- Present the file count and a sample (first 10) to the user. Confirm before reading anything heavy.

### Bulk Step 2: Plan across the whole batch

Read the frontmatter / first page of each file to get a title guess. Produce a single combined plan:

```
| # | Source file | Proposed KB entry | Category |
|---|-------------|-------------------|----------|
| 1 | policies/acceptable-use.pdf | security/acceptable-use.md | security |
| 2 | policies/retention.pdf      | security/data-retention.md | security |
| ...
```

Rules:
- One KB entry per source file by default. Split a source into multiple entries only when it clearly covers multiple distinct topics.
- Prefer nested categories (e.g., `security/access`) when the batch is large enough that a flat category would become unwieldy (> ~10 entries in one category).
- Flag duplicates up front: if a planned entry already exists in the KB, mark it "UPDATE" instead of "CREATE".

Wait for user confirmation on the full plan before proceeding.

### Bulk Step 3: Process in parallel

- For ≤ 5 files, process sequentially (easier to follow, fewer context switches).
- For > 5 files, dispatch a subagent per file (or per small group of related files) with the import instructions, the target path from the plan, and the existing KB index as context. Collect results.
- If any subagent fails, keep the successful entries and report the failures so the user can retry a smaller batch.

### Bulk Step 4: Finalize

After all files are processed:
```bash
python3 scripts/kb-index.py --write
python3 scripts/kb-validate.py
python3 scripts/kb-search.py "sanity-check-term"   # spot-check a term that should appear
```

Report: X created, Y updated, Z skipped (with reason per skip). Flag any validate warnings or errors.

More from this repository

session-searchSkill

Find context from past Claude Code (CLI) and Claude Cowork (desktop) sessions on this Mac. Use when the user wants to recall something they did before but can't find it , phrasings like "where did I work on X", "find that session where I…", "when did I last do Y", "pull up the conversation about Z", "that time I built/tried/discussed …". Searches by kind (code/cowork), time range, title, working directory, or free-text content across all transcripts.

task-profileSkill

Mine the user's Claude Code + Cowork session history into a structured task profile, what they do with AI, how often, how successfully where friction lives, then propose atomic skills that would reduce iteration. Use when the user asks to "analyse my Claude use", "build a task profile", "what tasks do I do with Claude", "where am I spending tokens", "what skills would help me", or mentions reviewing past sessions for patterns. Produces profile.csv (shareable), explorer.html (personal coaching view with AI-first principle comparison + token-spend chart), and skill-proposals.md.

token-doctorSkill

Personal diagnosis of where your Claude Code + Cowork spend goes. Reads local transcripts, prints your conversation length distribution, marathon share, cache rebuild costs, and per-project diagnosis (good projects and problem projects) right in the terminal. Then offers a deeper dive that fans out parallel Haiku subagents over your most expensive (and most efficient) sessions and writes a tight Markdown report. Use when the user asks "why is my Claude spend so high", "where am I burning tokens", "diagnose my Claude habits", "audit my Claude usage", or asks for a personal token-cost diagnosis.

ai-firstifySkill

Analyze, re-engineer, or bootstrap projects to align with AI-first design principles. Use when asked to review, audit, improve, 'ai-firstify', or start a new project. Performs deep analysis across 7 dimensions, actively restructures existing projects, or guides new project setup through discovery questions. Based on the 9 design principles and 7 design patterns from the TechWolf AI-First Bootcamp.

analyze-performanceSkill

Analyze engagement patterns across published posts to identify what works. Use when asked to review performance, find successful patterns, or optimize future content.

brainstorm-linkedinSkill

Generate LinkedIn post ideas from external sources (files, URLs, research). Use when the user provides source material (PDFs, URLs, articles) to brainstorm topics. NOT for writing or developing drafts - use write-linkedin-post instead.

brainstorm-opinionSkill

Generate opinion piece ideas from recent LinkedIn posts (last 30 days). Use when asked to find opinion topics, brainstorm article ideas, or cross-pollinate content between LinkedIn and opinion pieces.

content-studioSkill

Entry point for the TechWolf content-studio plugin. Use to understand the workflow, pick the right content skill, or start setup for a new author/repository.