I'm a web developer with a brief background in machine learning. An application I'm working on has potential for an intelligent word recognition algorithm to be implemented to save a client a bunch of time. I don't have enough knowledge about CV/ML to fully understand the problem domain and potential solutions, so I thought I'd come here for advice!
Background:
I have 30k images containing handwritten text. They are all in English, have similar features in the fact the background color and text heavily contrast, and there is very little noise in general. Almost all of the images were written by different people with unique handwriting. Each photo is 600x600px and contains anything from a single word, up to a short paragraph. 10k of the images have been manually analyzed. I have the extracted text for these 10k images available for use as a training set.
Problem:
I'd like to be able to script the extraction of text for the remaining 20k images.
Questions:
-Is it possible to achieve accuracy of 95% on this problem with the current state of OCR/IWR/ICR?
-I've looked at some open source OCR solutions, but most seem to not have great handwriting recognition options. Are there any FOSS projects that would fit this? I looked at tesseract but I read somewhere that for handwriting, most academics were able to get at most 90%.
-Are there any proprietary APIs out there that could help with this? I don't want to buy a big enterprise product, I just need to be able to POST an image to an endpoint and get back the text.
-Other ideas on how to tackle this problem? I thought about taking the output of the OCR/IWR program and use some sort of NLP technique to further improve accuracy, but this will in be possible in some cases.
Thanks in advance!
[link][comment]