PDF Generation Refactor Plan
Objectives
- Produce predictable, high-quality PDFs from StoryCraftr markdown without requiring LaTeX, Pandoc, or external tooling that complicates installation.
- Simplify the generation stack so it can run in common environments (pipx, CI, containers) with minimal dependencies.
- Maintain stylistic control (fonts, headings, table of contents) directly from Python.
Current Pain Points
- Pandoc/LaTeX dependency – Requires system packages, varies by OS, and breaks in isolated environments.
- Inconsistent output – Layout or fonts shift depending on local LaTeX installs; users report unpredictable PDF rendering.
- Limited customization – Styling tweaks require custom templates or LaTeX knowledge, slowing iteration.
Target Approach
- Adopt a pure-Python Markdown → PDF pipeline using a maintained library (initial candidate:
markdown-pdf). - Wrap the renderer so StoryCraftr controls document options (title page, metadata, optional cover art) through a simple config.
- Keep the output consistent across platforms by bundling CSS/themes alongside the renderer.
Evaluation of Libraries
| Library | Pros | Cons | Decision | | ——- | —- | —- | ——– | | markdown-pdf | Actively maintained, pure Python, supports CSS themes, simple API | Requires evaluation of font embedding, image support | Preferred | | weasyprint | Powerful CSS engine | Pulls in Cairo/Pango system deps | Consider if we need advanced layout | | reportlab | Full control, PDF native | Manual layout (no Markdown parser) | Not ideal |
Workstreams
- Prototype Renderer
- Create
storycraftr/pdf/renderer.pyusingmarkdown-pdf. - Test with sample chapters (headings, tables, images, code blocks).
- Bundle default CSS (matching StoryCraftr brand) in
storycraftr/pdf/themes/.
- Create
- CLI Integration
- Update
storycraftr/cmd/story/publish.py(and paper equivalent) to call the new renderer. - Provide CLI flags for theme selection, cover inclusion, frontmatter metadata.
- Update
- Configuration & Templates
- Allow per-project overrides (e.g.,
<book>/pdf-theme.css). - Document how authors can add CSS tweaks or custom fonts.
- Allow per-project overrides (e.g.,
- Testing & QA
- Add unit tests verifying renderer is invoked and outputs a PDF.
- Create integration tests comparing output size/checksum for known samples.
- Manual regression for multi-language content and large chapters.
- Docs & Migration
- Document the new pipeline in
docs/pdf.md(requirements, customization). - Update README / Getting Started instructions to reflect pure-Python flow.
- Note Pandoc/LaTeX is no longer required (or only optional for legacy flows).
- Document the new pipeline in
Risks & Mitigations
- Rendering Fidelity: Markdown → PDF engines vary; test complex constructs early (tables, blockquotes, images).
- Mitigation: Maintain sample Markdown fixtures and compare outputs during CI.
- CSS Complexity: Authors may want heavy customization.
- Mitigation: Provide layered themes (base + optional overrides) and document best practices.
- Binary Assets: Ensure
markdown-pdfhandles embedded images/fonts; otherwise add post-processing step.
Success Metrics
- PDF generation works via
poetry run storycraftr publish pdf ...on a clean pipx install (no Pandoc/LaTeX). - Automated tests validate output is created and is non-empty for sample books.
- Users report consistent rendering across macOS/Linux/Windows (tracked via issue reduction).