📜 Chapterwise Manuscript Format

The Chapterwise Manuscript Format is a specialized JSON structure optimized for manuscript processing, chapter detection, and literary analysis workflows. It provides a clean, linear structure perfect for handling books, articles, and other text-based documents.

Design Philosophy

Streamlined Structure: Direct manuscript node without envelope overhead for faster processing
Analysis-First: Structured for seamless integration with ChapterWise analysis modules
Processing-Aware: Includes metadata for detection algorithms, word counts, and processing status
Natural Ordering: Array position determines reading order - no complex position attributes needed
Content as First-Class: Content elevated to top-level alongside name and summary
Legacy Compatible: Can coexist with existing manuscript storage systems

File Structure

Manuscript files start directly with the manuscript node as the root object:

{
  "id": "manuscript-uuid",
  "type": "manuscript",
  "name": "The Odyssey",
  "summary": "An ancient Greek epic poem attributed to Homer, telling the story of Odysseus's journey home after the Trojan War",
  "tags": ["epic", "poetry", "ancient-greek", "completed"],
  "attributes": [
    { "key": "author", "name": "Author", "value": "Homer" },
    { "key": "word_count", "name": "Word Count", "value": 187000, "dataType": "int" },
    { "key": "chapter_count", "name": "Chapter Count", "value": 24, "dataType": "int" },
    { "key": "language", "name": "Language", "value": "ancient-greek" },
    { "key": "processing_status", "name": "Processing Status", "value": "completed" }
  ],
  "children": [ /* Books and other content */ ]
}

Note: This streamlined format eliminates the overhead of envelope structures while maintaining full compatibility with ChapterWise processing systems.

Required Fields

Manuscript Root Object

All manuscript files must include these required fields at the root level:

id - Unique identifier for the manuscript (UUID v4 format)
type - Must be "manuscript" to identify this as a manuscript node
name - Display title of the manuscript
children - Array containing chapters and other content nodes

Recommended Fields

summary - Brief description of the manuscript's content
tags - Array of descriptive tags for categorization
attributes - Array of key-value metadata (author, word count, etc.)

Chapter Structure

Each chapter is a child node with this structure:

{
  "id": "chapter-01-uuid",
  "type": "chapter",
  "name": "Book 1",
  "summary": "Athena inspires the prince - the gods debate Odysseus's fate",
  "content": "<p>Tell me, Muse, of that ingenious hero who travelled far and wide after he had sacked the famous town of Troy...</p>",
  "tags": ["opening", "divine-council", "invocation"],
  "attributes": [
    { "key": "node_position", "name": "Node Position", "value": 1, "dataType": "int" },
    { "key": "word_count", "name": "Word Count", "value": 2847, "dataType": "int" },
    { "key": "element_type", "name": "Element Type", "value": "h2" }
  ]
}

Node Types

Primary Types

manuscript - Root document container
chapter - Individual chapter content
preface - Introductory content before chapters
epilogue - Concluding content after chapters
appendix - Supplementary material
section - Sub-chapter divisions

Hierarchical Structure

manuscript
├── preface (optional)     # Array position 0
├── chapter (1..n)         # Array positions 1, 2, 3...
│   ├── section (optional)
│   └── footnote (optional)
├── epilogue (optional)    # After all chapters
└── appendix (optional)    # Final position

Order Preservation: The children[] array order determines reading sequence. No explicit position attributes needed.

Standard Attributes

Manuscript-Level Attributes

{
  "attributes": [
    { "key": "word_count", "name": "Word Count", "value": 187000, "dataType": "int" },
    { "key": "chapter_count", "name": "Chapter Count", "value": 24, "dataType": "int" },
    { "key": "language", "name": "Language", "value": "ancient-greek" },
    { "key": "genre", "name": "Genre", "value": "Epic Poetry" },
    { "key": "processing_status", "name": "Processing Status", "value": "completed" },
    { "key": "processing_engine", "name": "Processing Engine", "value": "chapterwise-v2.1" },
    { "key": "processed_at", "name": "Processed At", "value": "2025-07-17T10:30:00Z", "dataType": "date" },
    { "key": "has_toc", "name": "Has TOC", "value": true, "dataType": "boolean" },
    { "key": "original_format", "name": "Original Format", "value": "html" },
    { "key": "user_id", "name": "User ID", "value": "user-uuid", "tags": ["internal"] }
  ]
}

Chapter-Level Attributes

{
  "attributes": [
    { "key": "node_position", "name": "Node Position", "value": 1, "dataType": "int" },
    { "key": "word_count", "name": "Word Count", "value": 2847, "dataType": "int" },
    { "key": "element_type", "name": "Element Type", "value": "h2" },
    { "key": "partial", "name": "Partial", "value": false, "dataType": "boolean" },
    { "key": "estimated_reading_time", "name": "Estimated Reading Time", "value": 7, "dataType": "int" }
  ]
}

Processing Metadata

Track algorithmic processing status and statistics:

{
  "attributes": [
    { "key": "processing_stats", "name": "Processing Stats", "value": {
      "total_chunks": 12,
      "successful_chunks": 11,
      "failed_chunks": 1,
      "processing_time_seconds": 45.7,
      "detection_confidence": 0.94
    }},
    { "key": "detection_metadata", "name": "Detection Metadata", "value": {
      "toc_detected": true,
      "chapter_markers_found": ["h1", "h2", "h3"],
      "total_html_length": 450000
    }},
    { "key": "quality_metrics", "name": "Quality Metrics", "value": {
      "content_completeness": 0.99,
      "title_extraction_confidence": 0.95,
      "chapter_boundary_confidence": 0.92
    }}
  ]
}

Content Storage

Inline Content (Default)

{
  "content": "<p>Full chapter content here...</p>",
  "attributes": [
    { "key": "storage_strategy", "name": "Storage Strategy", "value": "inline" }
  ]
}

External Content (Large Chapters)

For very large chapters, content can be stored externally:

{
  "content": null,
  "attributes": [
    { "key": "content_url", "name": "Content URL", "value": "content/chapter-01-content.html" },
    { "key": "storage_strategy", "name": "Storage Strategy", "value": "external" },
    { "key": "content_size_bytes", "name": "Content Size Bytes", "value": 125000, "dataType": "int" }
  ]
}

Processing Results Integration

Processing results (chapter detection, TOC analysis, etc.) are stored in manuscript format as analysis documents:

{
  "id": "results-manuscript-id",
  "type": "analysis_document",
  "name": "Chapter Detection Results",
  "summary": "Processing results for manuscript Example Book",
  "tags": ["chapterwise-results", "chapter_detection"],
  "attributes": [
    { "key": "manuscript_id", "name": "Source Manuscript ID", "value": "manuscript-id" },
    { "key": "analysis_type", "name": "Analysis Type", "value": "chapter_detection" },
    { "key": "created_at", "name": "Created At", "value": "2025-07-17T15:00:00Z", "dataType": "date" }
  ],
  "children": [
    {
      "id": "analysis-chapter_detection-manuscript-id",
      "type": "analysis_results",
      "name": "Chapter Detection Results",
      "attributes": [
        { "key": "stats_total_chunks", "name": "Total Chunks", "value": 39, "dataType": "int" },
        { "key": "stats_successful_chunks", "name": "Successful Chunks", "value": 39, "dataType": "int" },
        { "key": "stats_failed_chunks", "name": "Failed Chunks", "value": 0, "dataType": "int" }
      ],
      "children": [
        {
          "id": "result-chunk-0",
          "type": "analysis_chunk",
          "name": "Chunk 0 Result",
          "attributes": [
            { "key": "chunk_idx", "name": "Chunk Index", "value": 0, "dataType": "int" },
            { "key": "success", "name": "Success", "value": true, "dataType": "boolean" },
            { "key": "raw_results", "name": "Raw Results", "value": { "chapters": [...] } }
          ]
        }
      ]
    }
  ]
}

Analysis Integration

Analysis results are also stored as child nodes:

{
  "children": [
    {
      "id": "analysis-summary-uuid",
      "type": "analysis",
      "name": "Manuscript Analysis Results",
      "tags": ["chapterwise-analysis", "automated"],
      "attributes": [
        { "key": "analysis_engine", "name": "Analysis Engine", "value": "chapterwise-v2.1" },
        { "key": "analysis_date", "name": "Analysis Date", "value": "2025-07-17T15:00:00Z" },
        { "key": "modules_used", "name": "Modules Used", "value": ["summary", "characters", "writing_style"] }
      ],
      "children": [
        {
          "id": "character-analysis-uuid",
          "type": "analysis_result",
          "name": "Character Analysis",
          "attributes": [
            { "key": "module", "name": "Module", "value": "characters" },
            { "key": "confidence", "name": "Confidence", "value": 0.92, "dataType": "float" },
            { "key": "results", "name": "Results", "value": { /* analysis data */ } }
          ]
        }
      ]
    }
  ]
}

File Naming Conventions

Primary Format: {manuscript_id}_manuscript.json
Results Format: {manuscript_id}_results.json (processing results in manuscript format)
Legacy Backup: {manuscript_id}_chapters.json (optional, legacy format)

Example Queries

Find Manuscripts by User

manuscripts = nodes.filter(n =>
  n.type === "manuscript" &&
  e.attributes.find(a => a.key === "user_id" && a.value === user_uuid)
)

Get Chapter Content

function getBookContent(manuscriptId, bookNumber) {
  const manuscript = nodes.find(n => n.id === manuscriptId);
  const book = manuscript?.children?.find(ch =>
    ch.type === "chapter" &&
    ch.attributes.find(a => a.key === "node_position" && a.value === bookNumber)
  );
  return book?.content;
}

Calculate Statistics

function getManuscriptStats(manuscriptId) {
  const manuscript = nodes.find(n => n.id === manuscriptId);
  const chapters = manuscript?.children?.filter(ch => ch.type === "chapter") || [];

  return {
    totalChapters: chapters.length,
    totalWords: chapters.reduce((sum, ch) =>
      sum + (ch.attributes.find(a => a.key === "word_count")?.value || 0), 0),
    avgWordsPerChapter: Math.round(totalWords / chapters.length)
  };
}

Migration from Legacy Format

ChapterWise automatically converts from the legacy _chapters.json format:

// Legacy format
{
  "manuscript_id": "uuid",
  "chapters": [
    { "id": 1, "title": "Book 1", "content": "..." }
  ]
}

// Converts to Manuscript format
{
  "id": "uuid",
  "type": "manuscript",
  "name": "Manuscript Title",
  "children": [
    { "type": "chapter", "name": "Book 1", "content": "..." }
  ]
}

Validation Rules

Required Fields

Manuscript Root: id, type (must be "manuscript"), name, children
Chapters: type (must be "chapter"), name, content, node_position attribute
Sequential: Node positions must be sequential starting from 1

Content Validation

Maximum chapter size: 1MB
Allowed HTML tags: p, h1-h6, em, strong, br
UTF-8 encoding required
Word count tolerance: ±5%

Best Practices

Always include word counts at both manuscript and chapter levels
Preserve chapter order in the children array
Use consistent naming - "Book 1", "Book 2", etc.
Include processing metadata for debugging and optimization
Store analysis results as child nodes for easy access

Next Steps

Learn about Codex Format for complex world-building
Upload your first manuscript to see this format in action