📜 Chapterwise Manuscript Format
The Chapterwise Manuscript Format is a specialized JSON structure optimized for manuscript processing, chapter detection, and literary analysis workflows. It provides a clean, linear structure perfect for handling books, articles, and other text-based documents.
Design Philosophy
- Streamlined Structure: Direct manuscript entity without envelope overhead for faster processing
- Analysis-First: Structured for seamless integration with ChapterWise analysis modules
- Processing-Aware: Includes metadata for detection algorithms, word counts, and processing status
- Natural Ordering: Array position determines reading order - no complex position attributes needed
- Content as First-Class: Content elevated to top-level alongside name and summary
- Legacy Compatible: Can coexist with existing manuscript storage systems
File Structure
Manuscript files start directly with the manuscript entity as the root object:
{
"id": "manuscript-uuid",
"type": "manuscript",
"name": "The Odyssey",
"summary": "An ancient Greek epic poem attributed to Homer, telling the story of Odysseus's journey home after the Trojan War",
"tags": ["epic", "poetry", "ancient-greek", "completed"],
"attributes": [
{ "key": "author", "name": "Author", "value": "Homer" },
{ "key": "word_count", "name": "Word Count", "value": 187000, "dataType": "int" },
{ "key": "chapter_count", "name": "Chapter Count", "value": 24, "dataType": "int" },
{ "key": "language", "name": "Language", "value": "ancient-greek" },
{ "key": "processing_status", "name": "Processing Status", "value": "completed" }
],
"children": [ /* Books and other content */ ]
}
Note: This streamlined format eliminates the overhead of envelope structures while maintaining full compatibility with ChapterWise processing systems.
Required Fields
Manuscript Root Object
All manuscript files must include these required fields at the root level:
id- Unique identifier for the manuscript (UUID v4 format)type- Must be"manuscript"to identify this as a manuscript entityname- Display title of the manuscriptchildren- Array containing chapters and other content entities
Recommended Fields
summary- Brief description of the manuscript's contenttags- Array of descriptive tags for categorizationattributes- Array of key-value metadata (author, word count, etc.)
Chapter Structure
Each chapter is a child entity with this structure:
{
"id": "chapter-01-uuid",
"type": "chapter",
"name": "Book 1",
"summary": "Athena inspires the prince - the gods debate Odysseus's fate",
"content": "<p>Tell me, Muse, of that ingenious hero who travelled far and wide after he had sacked the famous town of Troy...</p>",
"tags": ["opening", "divine-council", "invocation"],
"attributes": [
{ "key": "chapter_number", "name": "Chapter Number", "value": 1, "dataType": "int" },
{ "key": "word_count", "name": "Word Count", "value": 2847, "dataType": "int" },
{ "key": "element_type", "name": "Element Type", "value": "h2" }
]
}
Entity Types
Primary Types
manuscript- Root document containerchapter- Individual chapter contentpreface- Introductory content before chaptersepilogue- Concluding content after chaptersappendix- Supplementary materialsection- Sub-chapter divisions
Hierarchical Structure
manuscript
├── preface (optional) # Array position 0
├── chapter (1..n) # Array positions 1, 2, 3...
│ ├── section (optional)
│ └── footnote (optional)
├── epilogue (optional) # After all chapters
└── appendix (optional) # Final position
Order Preservation: The children[] array order determines reading sequence. No explicit position attributes needed.
Standard Attributes
Manuscript-Level Attributes
{
"attributes": [
{ "key": "word_count", "name": "Word Count", "value": 187000, "dataType": "int" },
{ "key": "chapter_count", "name": "Chapter Count", "value": 24, "dataType": "int" },
{ "key": "language", "name": "Language", "value": "ancient-greek" },
{ "key": "genre", "name": "Genre", "value": "Epic Poetry" },
{ "key": "processing_status", "name": "Processing Status", "value": "completed" },
{ "key": "processing_engine", "name": "Processing Engine", "value": "chapterwise-v2.1" },
{ "key": "processed_at", "name": "Processed At", "value": "2025-07-17T10:30:00Z", "dataType": "date" },
{ "key": "has_toc", "name": "Has TOC", "value": true, "dataType": "boolean" },
{ "key": "original_format", "name": "Original Format", "value": "html" },
{ "key": "user_id", "name": "User ID", "value": "user-uuid", "tags": ["internal"] }
]
}
Chapter-Level Attributes
{
"attributes": [
{ "key": "chapter_number", "name": "Chapter Number", "value": 1, "dataType": "int" },
{ "key": "word_count", "name": "Word Count", "value": 2847, "dataType": "int" },
{ "key": "element_type", "name": "Element Type", "value": "h2" },
{ "key": "partial", "name": "Partial", "value": false, "dataType": "boolean" },
{ "key": "estimated_reading_time", "name": "Estimated Reading Time", "value": 7, "dataType": "int" }
]
}
Processing Metadata
Track algorithmic processing status and statistics:
{
"attributes": [
{ "key": "processing_stats", "name": "Processing Stats", "value": {
"total_chunks": 12,
"successful_chunks": 11,
"failed_chunks": 1,
"processing_time_seconds": 45.7,
"detection_confidence": 0.94
}},
{ "key": "detection_metadata", "name": "Detection Metadata", "value": {
"toc_detected": true,
"chapter_markers_found": ["h1", "h2", "h3"],
"total_html_length": 450000
}},
{ "key": "quality_metrics", "name": "Quality Metrics", "value": {
"content_completeness": 0.99,
"title_extraction_confidence": 0.95,
"chapter_boundary_confidence": 0.92
}}
]
}
Content Storage
Inline Content (Default)
{
"content": "<p>Full chapter content here...</p>",
"attributes": [
{ "key": "storage_strategy", "name": "Storage Strategy", "value": "inline" }
]
}
External Content (Large Chapters)
For very large chapters, content can be stored externally:
{
"content": null,
"attributes": [
{ "key": "content_url", "name": "Content URL", "value": "content/chapter-01-content.html" },
{ "key": "storage_strategy", "name": "Storage Strategy", "value": "external" },
{ "key": "content_size_bytes", "name": "Content Size Bytes", "value": 125000, "dataType": "int" }
]
}
Processing Results Integration
Processing results (chapter detection, TOC analysis, etc.) are stored in manuscript format as analysis documents:
{
"id": "results-manuscript-id",
"type": "analysis_document",
"name": "Chapter Detection Results",
"summary": "Processing results for manuscript Example Book",
"tags": ["chapterwise-results", "chapter_detection"],
"attributes": [
{ "key": "manuscript_id", "name": "Source Manuscript ID", "value": "manuscript-id" },
{ "key": "analysis_type", "name": "Analysis Type", "value": "chapter_detection" },
{ "key": "created_at", "name": "Created At", "value": "2025-07-17T15:00:00Z", "dataType": "date" }
],
"children": [
{
"id": "analysis-chapter_detection-manuscript-id",
"type": "analysis_results",
"name": "Chapter Detection Results",
"attributes": [
{ "key": "stats_total_chunks", "name": "Total Chunks", "value": 39, "dataType": "int" },
{ "key": "stats_successful_chunks", "name": "Successful Chunks", "value": 39, "dataType": "int" },
{ "key": "stats_failed_chunks", "name": "Failed Chunks", "value": 0, "dataType": "int" }
],
"children": [
{
"id": "result-chunk-0",
"type": "analysis_chunk",
"name": "Chunk 0 Result",
"attributes": [
{ "key": "chunk_idx", "name": "Chunk Index", "value": 0, "dataType": "int" },
{ "key": "success", "name": "Success", "value": true, "dataType": "boolean" },
{ "key": "raw_results", "name": "Raw Results", "value": { "chapters": [...] } }
]
}
]
}
]
}
Analysis Integration
Analysis results are also stored as child entities:
{
"children": [
{
"id": "analysis-summary-uuid",
"type": "analysis",
"name": "Manuscript Analysis Results",
"tags": ["chapterwise-analysis", "automated"],
"attributes": [
{ "key": "analysis_engine", "name": "Analysis Engine", "value": "chapterwise-v2.1" },
{ "key": "analysis_date", "name": "Analysis Date", "value": "2025-07-17T15:00:00Z" },
{ "key": "modules_used", "name": "Modules Used", "value": ["summary", "characters", "writing_style"] }
],
"children": [
{
"id": "character-analysis-uuid",
"type": "analysis_result",
"name": "Character Analysis",
"attributes": [
{ "key": "module", "name": "Module", "value": "characters" },
{ "key": "confidence", "name": "Confidence", "value": 0.92, "dataType": "float" },
{ "key": "results", "name": "Results", "value": { /* analysis data */ } }
]
}
]
}
]
}
File Naming Conventions
- Primary Format:
{manuscript_id}_manuscript.json - Results Format:
{manuscript_id}_results.json(processing results in manuscript format) - Legacy Backup:
{manuscript_id}_chapters.json(optional, legacy format)
Example Queries
Find Manuscripts by User
manuscripts = entities.filter(e =>
e.type === "manuscript" &&
e.attributes.find(a => a.key === "user_id" && a.value === user_uuid)
)
Get Chapter Content
function getBookContent(manuscriptId, bookNumber) {
const manuscript = entities.find(e => e.id === manuscriptId);
const book = manuscript?.children?.find(ch =>
ch.type === "chapter" &&
ch.attributes.find(a => a.key === "chapter_number" && a.value === bookNumber)
);
return book?.content;
}
Calculate Statistics
function getManuscriptStats(manuscriptId) {
const manuscript = entities.find(e => e.id === manuscriptId);
const chapters = manuscript?.children?.filter(ch => ch.type === "chapter") || [];
return {
totalChapters: chapters.length,
totalWords: chapters.reduce((sum, ch) =>
sum + (ch.attributes.find(a => a.key === "word_count")?.value || 0), 0),
avgWordsPerChapter: Math.round(totalWords / chapters.length)
};
}
Migration from Legacy Format
ChapterWise automatically converts from the legacy _chapters.json format:
// Legacy format
{
"manuscript_id": "uuid",
"chapters": [
{ "id": 1, "title": "Book 1", "content": "..." }
]
}
// Converts to Manuscript format
{
"id": "uuid",
"type": "manuscript",
"name": "Manuscript Title",
"children": [
{ "type": "chapter", "name": "Book 1", "content": "..." }
]
}
Validation Rules
Required Fields
- Manuscript Root:
id,type(must be "manuscript"),name,children - Chapters:
type(must be "chapter"),name,content,chapter_numberattribute - Sequential: Chapter numbers must be sequential starting from 1
Content Validation
- Maximum chapter size: 1MB
- Allowed HTML tags:
p,h1-h6,em,strong,br - UTF-8 encoding required
- Word count tolerance: ±5%
Best Practices
- Always include word counts at both manuscript and chapter levels
- Preserve chapter order in the children array
- Use consistent naming - "Book 1", "Book 2", etc.
- Include processing metadata for debugging and optimization
- Store analysis results as child entities for easy access
Next Steps
- Learn about Codex Format for complex world-building
- Upload your first manuscript to see this format in action