OCR libraries


  • Garbage Person

    Are there any that don't suck shit?

    We're thinking about Skunkworks integration of one into the WtfFramework toolbox.

    Particularly looking at Tesseract for threr reasons:

    1. It has a ton of really cool potential project codenames
    2. Google maintains it.
    3. It has a sane and reasonable license.


  • @weng said in OCR libraries:

    Are there any that don't suck shit?

    That depends on your requirements.

    Particularly looking at Tesseract for threr reasons:

    1. It has a ton of really cool potential project codenames
    2. Google maintains it.
    3. It has a sane and reasonable license.

    I looked at Tesseract a couple years ago, as well as a couple expensive commercial solutions like ABBYY. They all could handle printed text reasonably ok, but my job was to create something that could digitize handwritten log books. As it turns out, OCR sucks at doing that, especially when it doesn't have the "inking" motion data a tablet does. We quickly gave up on that.

    The source to Tesseract isn't that great, and I believe it runs as a command line process. I seem to remember some French guy criticizing the state of the codebase and some suggested refactoring. Thus it comes with the usual FOSS warnings.


  • Garbage Person

    @groaner Yeah. Our main use case is all well defined typewritten documents.

    And I don't mind wrapping CLIs. At this point, virtually all our new integrations work that way because Unix brain worms have destroyed the industry and nobody publishes public APIs.




  • Garbage Person

    @rhywden The likelihood of getting permission to pass anything we do here to a third party cloud vendor, no matter what security promises they make, is basically zero.



  • @weng Too bad. Would likely have been the easiest solution ;)



  • I used to use abbyy. For personal stuff it was fantastic. About the only thing I remember it having problems with was tables without headers.


  • Garbage Person

    @rhywden said in OCR libraries:

    @weng Too bad. Would likely have been the easiest solution ;)

    I mean, I've got these Azure credits burning a hole in my pocket so i'll probably prototype it and pitch it with the in-house solution, but unless performance is like, orders of magnitude out, nobody will go for it.


Log in to reply