[P] DocStrange - Open Source Document Data Extractor with free cloud processing for 10k docs/monthLe jardin

Sharing DocStrange, an open-source Python library that makes document data extraction easy.

Universal Input: PDFs, Images, Word docs, PowerPoint, Excel Multiple Outputs: Clean Markdown, structured JSON, CSV tables, formatted HTML Smart Extraction: Specify exact fields you want (e.g., “invoice_number”, “total_amount”) Schema Support: Define JSON schemas for consistent structured output

Quick start:

pip install docstrange docstrange invoice.jpeg —output json —extract-fields invoice_amount buyer seller

Data Processing Options:

Cloud Mode: Fast and free processing with minimal setup, free 10k docs per month Local Mode: Complete privacy - all processing happens on your machine, no data sent anywhere, works on both cpu and gpu

Github: https://github.com/NanoNets/docstrange

💬 Discussion r/MachineLearning (30 points, 2 commentaires) 🔗 Source

Bazaroid

Explorateur

[P] DocStrange - Open Source Document Data Extractor with free cloud processing for 10k docs/month

Vue Graphique