- Layout Analysis Model: A model based on RT-DETR and trained on DocLayNet (a human-annotated data set for document layout analysis) that classifies page elements like paragraphs, section titles, lists, and tables.
- TableFormer: A vision-transformer model for table structure recovery that can handle complex tables with partial or no borderlines, empty cells, cell spans, and hierarchical headers.
The Docling processing pipeline works by feeding page images to the Layout Analysis Model, which identifies document elements. For tables, TableFormer processes the detected table regions to recover their structure. When needed, OCR capabilities are available through integration with EasyOCR.
Using Docling is straightforward:
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
Docling also provides a convenient command-line interface for quick conversions:
docling https://arxiv.org/pdf/2206.01062
Key use cases for Docling
Docling’s capabilities make it ideal for several critical use cases including retrieval-augmented generation, knowledge base creation, LLM fine-tuning, and enterprise data integration.