Extract PDF text in your browser with LiteParse for the web ↗
LlamaIndex’s LiteParse, traditionally a Node.js CLI tool for PDF text extraction, has been implemented to run entirely in the browser. The browser version supports OCR as an option and uses PDF.js and Tesseract.js, employing spatial text parsing to preserve logical reading order, including multi-column layouts. The project homepage demo and a plan-driven development process, including a noted plan.md artifact and GitHub Actions deployment to GitHub Pages, are described in the article.