> You know, I’ve really looked hard at what’s out there and haven’t been able to find anything else that’s totally free/open, that runs well on CPU, and which has better quality output than Tesseract. I found a couple Chinese projects but had trouble getting them to work and the documentation wasn’t great. If you have any leads on others to try I’d love to hear about them.
I did more or less the same, trying to solve the same problem. I ended up biting the bullet and using Amazon Textract. The OCR is much better than Tesseract, and the layout tool is quite reliable to get linear text out of 2-columns documents (which is critical for my use case).
I would be very happy to find something as reliable that would work on a workstation without relying on anyone’s cloud.
I did more or less the same, trying to solve the same problem. I ended up biting the bullet and using Amazon Textract. The OCR is much better than Tesseract, and the layout tool is quite reliable to get linear text out of 2-columns documents (which is critical for my use case).
I would be very happy to find something as reliable that would work on a workstation without relying on anyone’s cloud.