---
name: docx-toolkit
description: Create, edit, and analyze Word documents (.docx files) with support for tracked changes, comments, formatting preservation, and professional document workflows.
---

# Word Document Toolkit

This skill enables Claude to create, edit, and analyze Word documents (.docx files), which are ZIP archives containing XML. Supports tracked changes, comments, and formatting preservation.

## Core Libraries & Tools

| Tool | Best For |
|------|----------|
| **python-docx** | Creating and editing documents |
| **pandoc** | Format conversion, text extraction |
| **docx (npm)** | Node.js document creation |

## Key Workflows

### 1. Reading Content

**Quick Text Extraction (pandoc)**:
```bash
pandoc document.docx -t markdown -o output.md
```

**Programmatic Access (python-docx)**:
```python
from docx import Document

doc = Document('document.docx')

# Read paragraphs
for para in doc.paragraphs:
    print(para.text)

# Read tables
for table in doc.tables:
    for row in table.rows:
        for cell in row.cells:
            print(cell.text)
```

### 2. Creating Documents

**Python Approach**:
```python
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH

doc = Document()

# Title
title = doc.add_heading('Document Title', 0)
title.alignment = WD_ALIGN_PARAGRAPH.CENTER

# Paragraph with formatting
para = doc.add_paragraph()
run = para.add_run('This is bold text.')
run.bold = True

para.add_run(' This is normal text.')

# Bullet list
doc.add_paragraph('First item', style='List Bullet')
doc.add_paragraph('Second item', style='List Bullet')

# Table
table = doc.add_table(rows=3, cols=3)
table.style = 'Table Grid'

# Header row
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Name'
hdr_cells[1].text = 'Role'
hdr_cells[2].text = 'Department'

# Image
doc.add_picture('image.png', width=Inches(4))

doc.save('output.docx')
```

**JavaScript/TypeScript Approach (docx-js)**:
```typescript
import { Document, Paragraph, TextRun, Packer } from 'docx';
import * as fs from 'fs';

const doc = new Document({
  sections: [{
    properties: {},
    children: [
      new Paragraph({
        children: [
          new TextRun({
            text: "Hello World",
            bold: true,
            size: 28,
          }),
        ],
      }),
      new Paragraph({
        children: [
          new TextRun("This is a normal paragraph."),
        ],
      }),
    ],
  }],
});

// Generate and save
const buffer = await Packer.toBuffer(doc);
fs.writeFileSync('output.docx', buffer);
```

### 3. Editing Existing Documents

```python
from docx import Document

doc = Document('existing.docx')

# Modify specific paragraph
for para in doc.paragraphs:
    if 'old text' in para.text:
        # Replace text while preserving formatting
        for run in para.runs:
            run.text = run.text.replace('old text', 'new text')

# Add content at end
doc.add_paragraph('New paragraph added.')

doc.save('modified.docx')
```

## Document Structure

A .docx file is a ZIP archive containing:

```
document.docx/
├── [Content_Types].xml
├── _rels/
│   └── .rels
├── word/
│   ├── document.xml      # Main content
│   ├── styles.xml        # Styles
│   ├── comments.xml      # Comments
│   ├── numbering.xml     # Lists
│   └── media/            # Images
└── docProps/
    ├── core.xml          # Metadata
    └── app.xml           # App info
```

## Formatting Options

### Text Formatting

```python
from docx import Document
from docx.shared import Pt, RGBColor
from docx.enum.text import WD_UNDERLINE

doc = Document()
para = doc.add_paragraph()

run = para.add_run('Formatted text')
run.bold = True
run.italic = True
run.underline = WD_UNDERLINE.SINGLE
run.font.size = Pt(14)
run.font.color.rgb = RGBColor(0x42, 0x24, 0xE9)
run.font.name = 'Arial'
```

### Paragraph Formatting

```python
from docx.shared import Pt, Inches
from docx.enum.text import WD_ALIGN_PARAGRAPH

para = doc.add_paragraph('Aligned paragraph')
para.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
para.paragraph_format.first_line_indent = Inches(0.5)
para.paragraph_format.space_before = Pt(12)
para.paragraph_format.space_after = Pt(6)
para.paragraph_format.line_spacing = 1.5
```

### Styles

```python
# Use built-in styles
doc.add_heading('Heading 1', level=1)
doc.add_heading('Heading 2', level=2)
doc.add_paragraph('Normal paragraph')
doc.add_paragraph('Quote style', style='Quote')
doc.add_paragraph('Intense quote', style='Intense Quote')
```

## Tables

```python
from docx import Document
from docx.shared import Inches, Pt
from docx.oxml.ns import qn
from docx.oxml import OxmlElement

doc = Document()
table = doc.add_table(rows=4, cols=3)
table.style = 'Table Grid'

# Set column widths
for row in table.rows:
    row.cells[0].width = Inches(2)
    row.cells[1].width = Inches(2)
    row.cells[2].width = Inches(1.5)

# Header row with shading
for cell in table.rows[0].cells:
    cell.paragraphs[0].runs[0].bold = True
    # Add shading
    shading = OxmlElement('w:shd')
    shading.set(qn('w:fill'), '4472C4')
    cell._tc.get_or_add_tcPr().append(shading)

# Populate data
data = [
    ['Name', 'Role', 'Status'],
    ['John', 'Developer', 'Active'],
    ['Jane', 'Designer', 'Active'],
    ['Bob', 'Manager', 'Away'],
]

for row_idx, row_data in enumerate(data):
    for col_idx, cell_text in enumerate(row_data):
        table.rows[row_idx].cells[col_idx].text = cell_text
```

## Headers and Footers

```python
from docx import Document
from docx.shared import Pt

doc = Document()
section = doc.sections[0]

# Header
header = section.header
header_para = header.paragraphs[0]
header_para.text = "Document Header"
header_para.style = 'Header'

# Footer with page numbers
footer = section.footer
footer_para = footer.paragraphs[0]
footer_para.text = "Page "
# Add page number field (requires XML manipulation for dynamic numbers)
```

## Best Practices

1. **Preserve formatting** when editing - modify runs, not entire paragraphs
2. **Use styles** for consistent formatting
3. **Test with sample documents** before batch processing
4. **Handle exceptions** for corrupted or protected documents
5. **Backup originals** before automated edits

## Tips

- Use pandoc for quick format conversions
- python-docx doesn't support all Word features (e.g., tracked changes editing)
- For complex edits, consider direct XML manipulation
- Large documents may require streaming approaches
- Always validate output in Microsoft Word
