Skill726 repo starsupdated yesterday

docx

The docx skill enables creation, reading, editing, and manipulation of Word documents (.docx files) using tools like pandoc for text extraction, docx-js JavaScript library for generating new documents with formatting like tables and headers, and XML unpacking for advanced edits. Use this skill when users request Word documents, reports, memos, letters, or templates with professional formatting, or when extracting, reorganizing, or converting content within .docx files.

View source Repository: openloomi

Install in Claude Code

Copy

git clone --depth 1 https://github.com/melandlabs/openloomi /tmp/docx && cp -r /tmp/docx/skills/docx ~/.claude/skills/docx

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# DOCX creation, editing, and analysis

## Overview

A .docx file is a ZIP archive containing XML files.

## Quick Reference

| Task                   | Approach                                                          |
| ---------------------- | ----------------------------------------------------------------- |
| Read/analyze content   | `pandoc` or unpack for raw XML                                    |
| Create new document    | Use `docx-js` - see Creating New Documents below                  |
| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |

### Converting .doc to .docx

Legacy `.doc` files are **auto-converted** on read and write — no manual step needed:

```bash
# Reading .doc (auto-converted to .docx internally)
python scripts/office/unpack.py legacy.doc unpacked/

# Writing .doc (auto-converted from .docx)
python scripts/office/pack.py unpacked/ output.doc
```

LibreOffice is used for the conversion, with automatic sandbox-friendly socket handling.

### Reading Content

```bash
# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md

# Raw XML access
python scripts/office/unpack.py document.docx unpacked/
```

### Converting to Images

```bash
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
```

### Accepting Tracked Changes

To produce a clean document with all tracked changes accepted (requires LibreOffice):

```bash
python scripts/accept_changes.py input.docx output.docx
```

---

## Creating New Documents

Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`

### Setup

```javascript
const {
  Document,
  Packer,
  Paragraph,
  TextRun,
  Table,
  TableRow,
  TableCell,
  ImageRun,
  Header,
  Footer,
  AlignmentType,
  PageOrientation,
  LevelFormat,
  ExternalHyperlink,
  TableOfContents,
  HeadingLevel,
  BorderStyle,
  WidthType,
  ShadingType,
  VerticalAlign,
  PageNumber,
  PageBreak,
} = require("docx");

const doc = new Document({
  sections: [
    {
      children: [
        /* content */
      ],
    },
  ],
});
Packer.toBuffer(doc).then((buffer) => fs.writeFileSync("doc.docx", buffer));
```

### Validation

After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.

```bash
python scripts/office/validate.py doc.docx
```

### Page Size

```javascript
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [
  {
    properties: {
      page: {
        size: {
          width: 12240, // 8.5 inches in DXA
          height: 15840, // 11 inches in DXA
        },
        margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 }, // 1 inch margins
      },
    },
    children: [
      /* content */
    ],
  },
];
```

**Common page sizes (DXA units, 1440 DXA = 1 inch):**

| Paper        | Width  | Height | Content Width (1" margins) |
| ------------ | ------ | ------ | -------------------------- |
| US Letter    | 12,240 | 15,840 | 9,360                      |
| A4 (default) | 11,906 | 16,838 | 9,026                      |

**Landscape orientation:** docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:

```javascript
size: {
  width: 12240,   // Pass SHORT edge as width
  height: 15840,  // Pass LONG edge as height
  orientation: PageOrientation.LANDSCAPE  // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)
```

### Styles (Override Built-in Headings)

Use Arial as the default font (universally supported). Keep titles black for readability.

```javascript
const doc = new Document({
  styles: {
    default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
    paragraphStyles: [
      // IMPORTANT: Use exact IDs to override built-in styles
      {
        id: "Heading1",
        name: "Heading 1",
        basedOn: "Normal",
        next: "Normal",
        quickFormat: true,
        run: { size: 32, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 },
      }, // outlineLevel required for TOC
      {
        id: "Heading2",
        name: "Heading 2",
        basedOn: "Normal",
        next: "Normal",
        quickFormat: true,
        run: { size: 28, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 },
      },
    ],
  },
  sections: [
    {
      children: [
        new Paragraph({
          heading: HeadingLevel.HEADING_1,
          children: [new TextRun("Title")],
        }),
      ],
    },
  ],
});
```

### Lists (NEVER use unicode bullets)

```javascript
// ❌ WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("• Item")] }); // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] }); // BAD

// ✅ CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
  numbering: {
    config: [
      {
        reference: "bullets",
        levels: [
          {
            level: 0,
            format: LevelFormat.BULLET,
            text: "•",
            alignment: AlignmentType.LEFT,
            style: { paragraph: { indent: { left: 720, hanging: 360 } } },
          },
        ],
      },
      {
        reference: "numbers",
        levels: [
          {
            level: 0,
            format: LevelFormat.DECIMAL,
            text: "%1.",
            alignment: AlignmentType.LEFT,
            style: { paragraph: { indent: { left: 720, hanging: 360 } } },
          },
        ],
      },
    ],
  },
  sections: [
    {
      children: [
        new Paragraph({
          numbering: { reference: "bullets", level: 0 },
          children: [new TextRun("Bullet item")],
        }),
        new Paragraph({
          numbering: { reference: "numbers", level: 0 },
          children: [new TextRun("Numbered item")]