docx
This skill creates, reads, edits, and manipulates Word documents (.docx files) with support for structured report generation, professional formatting, tables of contents, headings, images, and tracked changes. Use it when the user requests Word documents, reports, memos, letters, or templates with formatting requirements, or when extracting, reorganizing, or converting content within .docx files. The skill includes a JSON-based report generator compatible with the PDF generator schema for producing research reports with cover pages, tables of contents, and reference sections.
git clone --depth 1 https://github.com/AgentTeam-TaichuAI/ScienceClaw /tmp/docx && cp -r /tmp/docx/ScienceClaw/backend/builtin_skills/docx ~/.claude/skills/docxSKILL.md
# DOCX creation, editing, and analysis
## Overview
A .docx file is a ZIP archive containing XML files.
## Generating Reports (Recommended for research reports)
For structured reports with cover page, TOC, tables, images, and references, use the pre-built template. The DOCX generator accepts the **same JSON schema** as the PDF generator, so one `report_data.json` can produce both formats.
### Step 1: Copy the generator script
```bash
cp /builtin-skills/docx/scripts/generate_report.js ./generate_report.js
```
### Step 2: Build `report_data.json`
**Two phases: write section text files, then assemble into JSON.**
**Phase 1 — Write each section as a plain text file** using `write_file`:
For each major section, `read_file` the relevant research data, then `write_file` the section content directly:
```
read_file("research_data/literature.md") # refresh data in context
write_file("sections/sec_01_intro.txt", "...") # write section content
write_file("sections/sec_02_mutations.txt", "...") # next section
...
```
Each section file should be 1,000-2,000+ words with specific data, citations, and analysis.
**NEVER write a Python script that contains section text as string literals.** The section content goes directly into .txt files via `write_file`, not into Python code. Do NOT write scripts named "generate_sections", "create_content", "build_report" etc. that embed text in Python strings. If a sandbox script fails twice, switch to direct `write_file` calls.
**Writing style — academic research report (CRITICAL):**
- Write continuous flowing prose. Each paragraph: 8–10 sentences following the pattern: topic sentence → supporting evidence with specific data → analysis/comparison → transition to next point.
- Use in-text citations [1], [2] when referencing data. These render as blue superscript in the DOCX. Do NOT add a "References" list at the end of each chapter — all references go in ONE final `references` section.
- Synthesize across sources: "Study A [1] reported X, while Study B [2] found Y, suggesting that Z."
- Use academic connectives: "Furthermore", "In contrast", "These findings indicate", "Notably", "Taken together".
- NEVER use numbered-point structure (e.g. "1. Topic Title\n\nParagraph. 2. Topic Title\n\nParagraph."). Instead, use `##` subheadings for structure and prose paragraphs for content. The template's `renderText` correctly renders `##`/`###` as formatted subheadings.
- Bullet lists: max 5% of section, only for short enumerations (e.g. 4-5 drug names).
**Language (CRITICAL):**
- All report content (title, subtitle, section headings, body text, chart labels, table headers, cover metadata) MUST be written in the **user's configured language** as specified in the system prompt's `## Language` section.
- If the user's language is Chinese (`zh`), write the entire report in Chinese; if English (`en`), write in English. Do NOT mix languages unless quoting a proper noun or technical term that has no standard translation.
**Verification (before generating DOCX):**
Run a one-liner to count chars per section:
```
python3 -c "import os,glob; [print(f'{f}: {len(open(f).read())} chars') for f in sorted(glob.glob('sections/*.txt'))]; total=sum(len(open(f).read()) for f in glob.glob('sections/*.txt')); print(f'TOTAL: {total} chars, ~{total//500} pages')"
```
If total is under target, do ONE revision: `read_file` data again, then `write_file` to rewrite the thinnest 1-2 sections. Then proceed — no looping.
**Phase 2 — Assemble into JSON** using a standard assembler script:
```python
import json, glob, os
SECTIONS_DIR = "sections"
TITLE = "Report Title"
SUBTITLE = "Subtitle"
# Section config: (file_pattern, heading_number, heading_text)
SECTION_MAP = [
("sec_01_*.txt", "1.", "Introduction"),
("sec_02_*.txt", "2.", "Background"),
("sec_03_*.txt", "3.", "Analysis"),
("sec_04_*.txt", "4.", "Discussion"),
("sec_05_*.txt", "5.", "Conclusion"),
]
data = {
"title": TITLE, "subtitle": SUBTITLE,
"short_title": TITLE[:40], "report_type": "Research Report",
"toc": True, "sections": [],
"cover_meta": [["Report Type", "Research Report"]],
}
for pattern, num, heading in SECTION_MAP:
matches = sorted(glob.glob(os.path.join(SECTIONS_DIR, pattern)))
if not matches:
continue
body = open(matches[0], encoding="utf-8").read().strip()
data["sections"].append({"type": "heading", "level": 1, "number": num, "text": heading})
data["sections"].append({"type": "text", "body": body})
# Add references section at end
data["sections"].append({"type": "heading", "level": 1, "text": "References"})
data["sections"].append({"type": "references", "items": []})
with open("report_data.json", "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
print(f"Generated report_data.json ({len(data['sections'])} sections)")
```
**Adding charts**: Generate chart images separately using matplotlib, then include via the `image` section type.
**CRITICAL — CJK font config (MUST include when chart has Chinese/Japanese/Korean text):**
```python
import matplotlib
matplotlib.use("Agg")
matplotlib.rcParams["font.sans-serif"] = [
"WenQuanYi Micro Hei", "SimHei", "Noto Sans CJK SC",
"Arial Unicode MS", "sans-serif",
]
matplotlib.rcParams["axes.unicode_minus"] = False
import matplotlib.pyplot as plt
# ... generate chart ...
plt.savefig("figures/chart.png", dpi=180, bbox_inches="tight")
plt.close()
```
Without this config, Chinese text in charts will appear as garbled boxes (□□□).
Then in the assembler:
```python
data["sections"].append({"type": "text", "body": analysis_text})
data["sections"].append({"type": "image", "path": "figures/chart.png", "caption": "Figure 1: ..."})
```
### Step 3: Generate the DOCX
```bash
python3 build_report_data.py # outputs report_data.json
node generate_report.js report_data.json output_report.docx
```
### Supported section types
| Type | Required Fields | Optional Fields |
|-----自动配置飞书机器人应用。当用户要求配置飞书、创建飞书机器人、接入 Lark/飞书、设置飞书 app_id/app_secret、或询问如何配置飞书 IM 时触发此 skill。该 skill 通过 sandbox 内置浏览器自动完成飞书开放平台上的应用创建、权限配置、事件订阅和发布,用户仅需扫码登录。
MANDATORY: When a user asks to install, find, search, or add ANY skill (e.g. 'install hello-world skill', 'find a skill for X', 'add a skill'), you MUST first run `skills find <query>` to search the skills ecosystem. NEVER create a skill from scratch without searching first. Even if the name sounds simple, always search — it may already exist as a published skill.
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
Use this skill any time a .pptx file is involved — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading or extracting text from .pptx files; editing or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions 'deck', 'slides', 'presentation', or references a .pptx filename. If a .pptx file needs to be opened, created, or touched, use this skill.
Create new skills, modify and improve existing skills, and measure skill performance. MANDATORY: Use this skill whenever the user wants to create a custom skill from scratch, design a workflow as a skill, write their own SKILL.md, update or optimize an existing skill, run evals to test a skill, benchmark skill performance, or asks questions like 'how do I make a skill', 'create a skill for X', 'turn this into a skill', 'I want to build a skill'. Even if the user doesn't use the word 'skill' explicitly, trigger this if they want to capture a reusable workflow or set of instructions for the agent.
Create new tools or upgrade existing tools for the agent. MANDATORY: Use this skill whenever the user wants to create a custom tool, convert a script into a reusable tool, write a new tool function, upgrade or modify an existing tool, test and improve a tool in the sandbox, or asks things like 'make a tool for X', 'create a tool that does Y', 'improve the X tool', 'upgrade my tool', 'turn this script into a tool'. Even if the user doesn't use the word 'tool' explicitly, trigger this if they want to add a new callable capability to the agent or modify an existing one.
Access 1000+ scientific tools through ToolUniverse for drug discovery, protein analysis, genomics, literature search, clinical data, ADMET prediction, molecular docking, and more. Use when the user needs biomedical or scientific research capabilities.
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.