Hacker News new | past | comments | ask | show | jobs | submit login

> You know, I’ve really looked hard at what’s out there and haven’t been able to find anything else that’s totally free/open, that runs well on CPU, and which has better quality output than Tesseract. I found a couple Chinese projects but had trouble getting them to work and the documentation wasn’t great. If you have any leads on others to try I’d love to hear about them.

I did more or less the same, trying to solve the same problem. I ended up biting the bullet and using Amazon Textract. The OCR is much better than Tesseract, and the layout tool is quite reliable to get linear text out of 2-columns documents (which is critical for my use case).

I would be very happy to find something as reliable that would work on a workstation without relying on anyone’s cloud.




Was this by any chance Paddle OCR https://github.com/PaddlePaddle/PaddleOCR




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: