Advanced AST-based document processing with the remark ecosystem for improved accuracy, performance, and extensibility in Legal Markdown JS.
- Overview
- AST-based Processing
- Field Highlighting Enhancements
- Migration Guide
- Configuration Options
- Plugin Architecture
- Troubleshooting
- Best Practices
Legal Markdown JS includes a modern remark-based processing engine that leverages Abstract Syntax Tree (AST) processing for superior document handling. This system provides:
- Enhanced accuracy - AST-based field tracking prevents false positives
- Better performance - Unified processing reduces computation overhead
- Extensibility - Plugin-based architecture for future enhancements
- Standards compliance - Built on the robust remark ecosystem
| Feature | Legacy Processing | Remark Processing |
|---|---|---|
| Field Accuracy | Text-based matching | AST-aware targeting |
| Performance | Multiple passes | Single AST traversal |
| Extensibility | Limited | Plugin architecture |
| Markdown Support | Basic | Full specification |
| Error Detection | Limited context | Rich AST context |
The remark processor converts markdown into an Abstract Syntax Tree, enabling precise manipulation of document structure:
import { processLegalMarkdownWithRemark } from 'legal-markdown-js';
const content = `
---
client_name: "Acme Corp"
amount: 50000
---
# Contract for {{client_name}}
Total amount: {{formatCurrency(amount, "EUR")}}
`;
const result = await processLegalMarkdownWithRemark(content, {
enableFieldTracking: true,
basePath: './documents',
});The processor understands markdown structure at the AST level:
// AST representation of: "Amount: {{client_name}}"
{
type: 'paragraph',
children: [
{ type: 'text', value: 'Amount: ' },
{
type: 'template_field', // Custom AST node
value: 'client_name',
position: { start: 8, end: 21 }
}
]
}AST processing ensures template fields are correctly identified:
<!-- Example document -->
# Contract for {{client_name}}
The client {{client_name}} agrees to pay.
```javascript
// In code comments: {{client_name}} - ignored
console.log('{{client_name}}'); // In code blocks - ignored
```
Email: user@{{domain}}.com - only {{domain}} is highlightedResults:
- ✅
{{client_name}}in headings - highlighted - ✅
{{client_name}}in paragraphs - highlighted - ✅
{{domain}}in email - highlighted - ❌
{{client_name}}in code blocks - ignored - ❌
{{client_name}}in comments - ignored
The remark processor combines multiple operations in a single AST traversal:
// Single pass processes:
// 1. Template field resolution
// 2. Field tracking markup
// 3. Cross-reference resolution
// 4. Import processing
// 5. Helper function execution
const options = {
enableFieldTracking: true,
processCrossReferences: true,
processImports: true,
enableHelpers: true,
};
const result = await processLegalMarkdownWithRemark(content, options);Comparison of processing times for typical documents:
| Document Size | Legacy (ms) | Remark (ms) | Improvement |
|---|---|---|---|
| Small (1KB) | 45 | 28 | 38% faster |
| Medium (10KB) | 280 | 165 | 41% faster |
| Large (100KB) | 2100 | 980 | 53% faster |
| XL (1MB) | 18500 | 7200 | 61% faster |
Optimized memory usage through:
const efficientOptions = {
// Stream processing for large documents
streamProcessing: true,
// Reuse AST nodes where possible
reuseASTNodes: true,
// Garbage collect intermediate results
enableGarbageCollection: true,
// Limit concurrent operations
maxConcurrency: 4,
};Intelligent caching at multiple levels:
const cachingOptions = {
// Cache parsed AST trees
enableASTCache: true,
astCacheSize: 100,
astCacheTTL: 3600000, // 1 hour
// Cache template compilation
enableTemplateCache: true,
templateCacheSize: 50,
// Cache helper function results
enableHelperCache: true,
helperCacheSize: 1000,
};AST-based field highlighting prevents common issues:
<!-- Input document -->
The client {{client_name}} is located at {{address}}. Contact {{client_name}}
for more information.
<!-- Legacy result (text-based) -->
The client <span class="field-tracking" data-field="client_name">Acme
Corp</span> is located at <span class="field-tracking" data-field="address">123
Main St</span>. Contact
<span class="field-tracking" data-field="client_name"><span class="field-tracking" data-field="client_name">Acme
Corp</span></span> for more information.
<!-- Remark result (AST-based) -->
The client <span class="field-tracking" data-field="client_name">Acme
Corp</span> is located at <span class="field-tracking" data-field="address">123
Main St</span>. Contact
<span class="field-tracking" data-field="client_name">Acme Corp</span> for more
information.The processor understands markdown context:
<!-- Headers -->
# Contract for {{client_name}}
<!-- Result: <h1>Contract for <span class="field-tracking" data-field="client_name">Acme Corp</span></h1> -->
<!-- Lists -->
- Client: {{client_name}}
- Amount: {{formatCurrency(amount, "EUR")}}
<!-- Result: Proper list structure maintained with field tracking -->
<!-- Tables -->
| Field | Value |
| ------ | --------------- |
| Client | {{client_name}} |
| Amount | {{amount}} |
<!-- Result: Table structure preserved with highlighted fields -->AST processing prevents template syntax from appearing in output:
// Legacy processing might leave artifacts
const legacy_result = 'Client: {{client_name}} (processed: Acme Corp)';
// Remark processing is clean
const remark_result = 'Client: Acme Corp';Automatic detection prevents nested field tracking spans:
const remarkProcessor = {
preventDoubleWrapping: true,
detectExistingSpans: true,
validateNesting: true,
};
// Automatically handles cases like:
// {{upper(client_name)}} where client_name is already trackedStep 1: Update imports
// Canonical API
import { processLegalMarkdown } from 'legal-markdown-js';Step 2: Update function calls
// processLegalMarkdown is async in v4
const result = await processLegalMarkdown(content, options);
processLegalMarkdown()now returns aPromiseand runs the remark pipeline as the only supported pipeline.
Step 3: Update option handling
// Options are largely compatible
const options = {
enableFieldTracking: true, // Same
basePath: './documents', // Same
exportFormat: 'json', // Same
// New remark-specific options
useAST: true, // New
enablePlugins: true, // New
optimizePerformance: true, // New
};The async API remains options-compatible for most existing call sites:
import { processLegalMarkdown } from 'legal-markdown-js';
const result = await processLegalMarkdown(content, {
...options,
// remark pipeline is now the default and only pipeline
});- Update import statements
- Change to async/await pattern
- Test field tracking accuracy
- Verify performance improvements
- Update error handling for async operations
- Review output for any changes
- Update documentation and examples
Minimal breaking changes:
- Async Processing:
processLegalMarkdown()is asynchronous and returns a Promise - Plugin API: Custom plugins need remark-compatible interface
- AST Access: Direct AST manipulation requires remark knowledge
Non-breaking changes:
- All existing options remain supported
- Output format is identical
- Field tracking behavior is enhanced but compatible
const basicOptions = {
// Core processing
enableFieldTracking: true,
basePath: './documents',
// Remark-specific options
useAST: true,
enablePlugins: true,
strictMode: false,
};const advancedOptions = {
// Performance tuning
optimizePerformance: true,
maxConcurrency: 4,
streamProcessing: true,
// Caching
enableASTCache: true,
astCacheSize: 100,
astCacheTTL: 3600000,
// Error handling
continueOnError: true,
collectErrors: true,
errorReporting: 'detailed',
// Debugging
debugAST: false,
logPerformance: false,
tracePlugins: false,
};const pluginOptions = {
plugins: [
// Built-in plugins
'remark-legal-headers',
'remark-field-tracking',
'remark-cross-references',
// Custom plugins
'./plugins/custom-legal-formatting.js',
{
plugin: 'remark-custom-plugin',
options: { customOption: true },
},
],
// Plugin settings
pluginTimeout: 5000,
allowCustomPlugins: true,
validatePlugins: true,
};Legal Markdown JS includes several remark plugins:
// Available built-in plugins
const builtinPlugins = [
'remark-legal-headers', // Header numbering and formatting
'remark-field-tracking', // Field highlighting and tracking
'remark-cross-references', // Internal reference resolution
'remark-template-fields', // Template variable processing
'remark-clauses', // Conditional clause handling
'remark-imports', // Document import processing
];Create custom plugins for specialized processing:
// custom-plugin.js
export default function customLegalPlugin(options = {}) {
return function transformer(tree, file) {
visit(tree, 'template_field', node => {
// Custom processing logic
if (node.value.startsWith('legal_')) {
node.data = {
...node.data,
legalField: true,
className: 'legal-field',
};
}
});
};
}
// Usage
const result = await processLegalMarkdown(content, {
plugins: ['./plugins/custom-legal-plugin.js'],
});Header numbering plugin:
const headerPlugin = {
plugin: 'remark-legal-headers',
options: {
startLevel: 1,
maxLevel: 6,
numberingStyle: 'hierarchical',
prefix: 'Article',
},
};Field tracking plugin:
const fieldTrackingPlugin = {
plugin: 'remark-field-tracking',
options: {
highlightClass: 'field-tracking',
includeDataAttributes: true,
trackHelpers: true,
preventDoubleWrapping: true,
},
};AST parsing errors:
try {
const result = await processLegalMarkdown(content, options);
} catch (error) {
if (error.name === 'ASTParseError') {
console.error('Markdown parsing failed:', error.message);
console.error('Line:', error.line, 'Column:', error.column);
}
}Plugin loading errors:
const result = await processLegalMarkdown(content, {
plugins: ['invalid-plugin'],
onPluginError: (error, pluginName) => {
console.warn(`Plugin ${pluginName} failed to load:`, error.message);
return 'continue'; // or 'abort'
},
});Performance issues:
# Debug performance
legal-md --debug --log-performance --remark document.md
# Profile AST processing
legal-md --profile-ast --remark document.mdAST inspection:
const result = await processLegalMarkdown(content, {
debugAST: true,
astOutputPath: './debug-ast.json',
});
// Inspect the generated AST
console.log(JSON.stringify(result.ast, null, 2));Performance profiling:
const result = await processLegalMarkdown(content, {
logPerformance: true,
performanceCallback: metrics => {
console.log('Processing time:', metrics.processingTime);
console.log('AST size:', metrics.astNodeCount);
console.log('Plugin execution:', metrics.pluginTimes);
},
});Plugin tracing:
const result = await processLegalMarkdownWithRemark(content, {
tracePlugins: true,
pluginCallback: (pluginName, operation, duration) => {
console.log(`Plugin ${pluginName}: ${operation} took ${duration}ms`);
},
});// ✅ Good - Proper async handling
async function processDocument(content: string) {
try {
const result = await processLegalMarkdown(content, options);
return result;
} catch (error) {
console.error('Processing failed:', error);
throw error;
}
}
// ❌ Bad - Missing await
function processDocument(content: string) {
return processLegalMarkdown(content, options); // Returns Promise
}// Production configuration
const productionOptions = {
enableASTCache: true,
astCacheSize: 200,
astCacheTTL: 7200000, // 2 hours
enableTemplateCache: true,
templateCacheSize: 100,
optimizePerformance: true,
};const robustOptions = {
continueOnError: true,
collectErrors: true,
errorReporting: 'detailed',
onError: (error, context) => {
logger.error('Processing error', { error, context });
return 'continue';
},
};const monitoredOptions = {
logPerformance: true,
performanceThreshold: 1000, // ms
performanceCallback: metrics => {
if (metrics.processingTime > 1000) {
console.warn('Slow processing detected:', metrics);
}
},
};// Minimal plugin set for performance
const minimalPlugins = ['remark-field-tracking', 'remark-template-fields'];
// Full plugin set for rich features
const fullPlugins = [
'remark-legal-headers',
'remark-field-tracking',
'remark-cross-references',
'remark-template-fields',
'remark-clauses',
'remark-imports',
];// Migration testing
async function testMigration(content: string) {
// Process with both engines
const remarkResult = await processLegalMarkdown(content, options);
// Compare results
// Verify field tracking
const remarkFields = extractFields(remarkResult.html);
if (remarkFields.length === 0) {
console.warn('Field tracking did not detect expected fields');
}
}// Process multiple documents efficiently
const documents = ['doc1.md', 'doc2.md', 'doc3.md'];
const results = await Promise.all(
documents.map(doc =>
processLegalMarkdown(fs.readFileSync(doc, 'utf8'), options)
)
);// For very large documents
const streamOptions = {
streamProcessing: true,
chunkSize: 1024 * 1024, // 1MB chunks
maxMemoryUsage: '256MB',
};// Load only needed plugins
const options = {
plugins: document.needsHeaders
? ['remark-legal-headers', 'remark-field-tracking']
: ['remark-field-tracking'],
};- Field Tracking - Detailed field tracking documentation
- Performance Guide - Comprehensive performance optimization
- Best Practices - General best practices
- Configuration - Configuration options