docx
The docx skill enables creation, editing, and analysis of Word documents through text extraction, XML manipulation, and tracked changes workflows. Use this skill for professional document processing tasks including content modification, format preservation, collaborative editing with redlining, and extracting text from existing .docx files. It supports reading documents via pandoc conversion and editing through OOXML manipulation, with special consideration for adding scientific diagrams to enhance visual communication of complex concepts.
git clone --depth 1 https://github.com/K-Dense-AI/claude-scientific-writer /tmp/docx && cp -r /tmp/docx/skills/document-skills/docx ~/.claude/skills/docxSKILL.md
# DOCX creation, editing, and analysis ## Overview A .docx file is a ZIP archive containing XML files and resources. Create, edit, or analyze Word documents using text extraction, raw XML access, or redlining workflows. Apply this skill for professional document processing, tracked changes, and content manipulation. ## Visual Enhancement with Scientific Schematics **When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.** If your document does not already contain schematics or diagrams: - Use the **scientific-schematics** skill to generate AI-powered publication-quality diagrams - Simply describe your desired diagram in natural language - Nano Banana Pro will automatically generate, review, and refine the schematic **For new documents:** Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text. **How to generate schematics:** ```bash python scripts/generate_schematic.py "your diagram description" -o figures/output.png ``` The AI will automatically: - Create publication-quality images with proper formatting - Review and refine through multiple iterations - Ensure accessibility (colorblind-friendly, high contrast) - Save outputs in the figures/ directory **When to add schematics:** - Document workflow diagrams - Process flowcharts - System architecture illustrations - Data flow diagrams - Organizational structure diagrams - Any complex concept that benefits from visualization For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation. --- ## Workflow Decision Tree ### Reading/Analyzing Content Use "Text extraction" or "Raw XML access" sections below ### Creating New Document Use "Creating a new Word document" workflow ### Editing Existing Document - **Your own document + simple changes** Use "Basic OOXML editing" workflow - **Someone else's document** Use **"Redlining workflow"** (recommended default) - **Legal, academic, business, or government docs** Use **"Redlining workflow"** (required) ## Reading and analyzing content ### Text extraction To read the text contents of a document, convert the document to markdown using pandoc. Pandoc provides excellent support for preserving document structure and can show tracked changes: ```bash # Convert document to markdown with tracked changes pandoc --track-changes=all path-to-file.docx -o output.md # Options: --track-changes=accept/reject/all ``` ### Raw XML access Raw XML access is required for: comments, complex formatting, document structure, embedded media, and metadata. For any of these features, unpack a document and read its raw XML contents. #### Unpacking a file `python ooxml/scripts/unpack.py <office_file> <output_directory>` #### Key file structures * `word/document.xml` - Main document contents * `word/comments.xml` - Comments referenced in document.xml * `word/media/` - Embedded images and media files * Tracked changes use `<w:ins>` (insertions) and `<w:del>` (deletions) tags ## Creating a new Word document When creating a new Word document from scratch, use **docx-js**, which allows you to create Word documents using JavaScript/TypeScript. ### Workflow 1. **MANDATORY - READ ENTIRE FILE**: Read [`docx-js.md`](docx-js.md) (~500 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding with document creation. 2. Create a JavaScript/TypeScript file using Document, Paragraph, TextRun components (You can assume all dependencies are installed, but if not, refer to the dependencies section below) 3. Export as .docx using Packer.toBuffer() ## Editing an existing Word document When editing an existing Word document, use the **Document library** (a Python library for OOXML manipulation). The library automatically handles infrastructure setup and provides methods for document manipulation. For complex scenarios, you can access the underlying DOM directly through the library. ### Workflow 1. **MANDATORY - READ ENTIRE FILE**: Read [`ooxml.md`](ooxml.md) (~600 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for the Document library API and XML patterns for directly editing document files. 2. Unpack the document: `python ooxml/scripts/unpack.py <office_file> <output_directory>` 3. Create and run a Python script using the Document library (see "Document Library" section in ooxml.md) 4. Pack the final document: `python ooxml/scripts/pack.py <input_directory> <office_file>` The Document library provides both high-level methods for common operations and direct DOM access for complex scenarios. ## Redlining workflow for document review This workflow allows planning comprehensive tracked changes using markdown before implementing them in OOXML. **CRITICAL**: For complete tracked changes, implement ALL changes systematically. **Batching Strategy**: Group related changes into batches of 3-10 changes. This makes debugging manageable while maintaining efficiency. Test each batch before moving to the next. **Principle: Minimal, Precise Edits** When implementing tracked changes, only mark text that actually changes. Repeating unchanged text makes edits harder to review and appears unprofessional. Break replacements into: [unchanged text] + [deletion] + [insertion] + [unchanged text]. Preserve the original run's RSID for unchanged text by extracting the `<w:r>` element from the original and reusing it. Example - Changing "30 days" to "60 days" in a sentence: ```python # BAD - Replaces entire sentence '<w:del><w:r><w:delText>The term is 30 days.</w:delText></w:r></w:del><w:ins><w:r><w:t>The term is 60 days.</w:t></w:r></w:ins>' # GOOD - Only marks what changed, preserves original <w:r> for
Comprehensive citation management for academic research. Search Google Scholar and PubMed for papers, extract accurate metadata, validate citations, and generate properly formatted BibTeX entries. This skill should be used when you need to find papers, verify citation information, convert DOIs to BibTeX, or ensure reference accuracy in scientific writing.
Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug development, clinical research, and evidence synthesis.
Write comprehensive clinical reports including case reports (CARE guidelines), diagnostic reports (radiology/pathology/lab), clinical trial reports (ICH-E3, SAE, CSR), and patient documentation (SOAP, H&P, discharge summaries). Full support with templates, regulatory compliance (HIPAA, FDA, ICH-GCP), and validation tools.
PDF manipulation toolkit. Extract text/tables, create PDFs, merge/split, fill forms, for programmatic document processing and analysis.
Presentation toolkit (.pptx). Create/edit slides, layouts, content, speaker notes, comments, for programmatic presentation creation and modification.
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
Generate or edit images using AI models (FLUX, Gemini). Use for general-purpose image generation including photos, illustrations, artwork, visual assets, concept art, and any image that isn't a technical diagram or schematic. For flowcharts, circuits, pathways, and technical diagrams, use the scientific-schematics skill instead.
Generate testable hypotheses. Formulate from observations, design experiments, explore competing explanations, develop predictions, propose mechanisms, for scientific inquiry across domains.