I am working on a problem, where I want to automatically read the number on images as follows:
As can be seen, the images are quite challenging! Not only are these not connected lines in all cases, but also the contrast differs a lot. My first attempt was using pytesseract after some preprocessing. I also created a StackOverflow post here.
While this approach works fine on an individual image, it is not universal, as it requires too much manual information for the preprocessing. The best solution I have so far, is to iterate over some hyperparameters such as threshold value, filter size of erosion/dilation, etc. However, this is computationally expensive!
Therefore I came to believe, that the solution I am looking for must be deep-learning based. I have two ideas here:
- Using a pre-trained network on a similar task
- Splitting the input images into separate digits and train / finetune a network myself in an MNIST fashion
Regarding the first approach, I have not found something good yet. Does anyone have an idea for that?
Regarding the second approach, I would need a method first to automatically generate images of the separate digits. I guess this should also be deep-learning-based. Afterward, I could maybe achieve some good results with some data augmentation.
Does anyone have ideas? 🙂
Regarding to your first approach,
There are two synthetically prepared datasets available:
I have used above datasets for text recognition on slab images. Images were quite challenging however now I achieved more than 90% accuracy for that. I have implemented following models to solve this task. These are:
For Transformation, you can choose TPS or None. With TPS, it has showed higher performance. They implemented Spatial Transformer Networks.
On Feature Extraction stage, you will have options: ResNet or VGG
For Sequential Stage, BiLSTM
Attn or CTC for prediction stage.
They achieved best accuracy on TPS-ResNet-BiLSTM-Attn version. You can easily fine tune this network and I hope it can solve your task. The model trained with above mentioned datasets.