Good free OCR?
Does anybody have a recommendation of good, free OCR software?
- No online services; must run entirely locally
- Only really needs to handle English text, but not completely botching code would be a big plus
- Should be able to output PDF with embedded images at least, but preferably also Word
- PDF output should (optionally?) contain the entire scanned page as a background image
- Command line mode would be a huge plus
Or download Tesseract and roll your own, that's what I did
Doesn't support PDF output, and its export to Word action just dumps a bunch of plain text into Word.
Anyway, I added a few more criteria/clarifications.
I tried something called "PDF OCR".
- doesn't install to Program Files by default
- not free, and the trial version is unusable
- mode 1/2, "PDF -> OCR", can't produce a PDF
- mode 2/2, "image -> PDF", doesn't OCR
- stupid skinned interface overrides the default window chrome
- doesn't register uninstaller for Add/Remove Programs
Verdict: -5 stars out of 5 and uninstall with extreme prejudice.
Do you need an application, or a library? There are several OCR programs around, though few are free. Libraries are a different story; when I was looking for one that fit similar requirements (as well as 'should work on both iOS and Android') what I found came down to a) Tesseract, and b) bupkiss. Since Tessaract is written mostly in K&R C (though most of it has been updated) with some later extensions in C++99, using it on Android was a problem, though not an insurmountable one.
There are several Java OCR libraries, none of which were both free and working (there were several free ones which didn't do shit, and even some of the commercial ones were marginal) and most of which ran Cloud hosted. Since the goal was to allow scanning of business card information into a PIM, relying on a Cloud app would have been less than ideal, but probably not impossible - if would have been possible to save the scanned card until a stable connection was available, so it would just be an annoyance rather than a showstopper.
Did I mention that the main Java OCR library would have also been the main competition of that app? Yeah, its usually not a good idea to rely on someone who wants to compete with you.
Most of the working libraries I did find, free or otherwise regardless of language, were just wrappers around Tesseract, anyway.