Print to Speech Reader

Started by kalpana, Nov 14, 2023, 04:54 AM

Previous topic - Next topic

kalpana

I have been investigating Print-to-Speech-Readers for someone who has lost their sight, and having balked at the price and performance of the commercial devices I have seen, wondered whether it would be possible to put together a system based on a flatbed scanner and Linux.

It sounds simple: scan image --> clean image --> ocr image --> speak ocr text. If I can get all these components working well, then I should be able to make a system that can read and speak the front of envelopes, letters, flyers, and perhaps magazines. So far, I have managed to get a clear text-to-speech voice using a patched version of the festival speech engine together with Nitech HTS voices, and reasonable OCR performance using the latest tesseract (3.01) engine. Scanning is simple using the SANE scanimage command, and I have found best results scanning at 600dpi in greyscale and then using scantailor to clean up and deskew the image prior to OCR.

What I have not found yet is a decent layout analysis application that can be run from the command line. The latest tesseract is supposed to perform some kind of layout analysis, but it still gets confused by much of the material that I would like to OCR. For example, local newsletters often include a variety of bizarre fonts, blocks of text enclosed by various solid and dotted borders and often with embedded images. Something that could strip the text areas out of such an image for OCR would be ideal. I have tried OCRFeeder, which sometimes helps although the CLI version does not yet output plain text for text-to-speech.

The other problem I have is with scanning time. I get best results scanning at 600 dpi, but this takes 40s just to scan the page on an Epson V330. I haven't managed to find anything with Linux drivers that is faster.

It would be good to hear from anyone who has also looked at this problem, or anyone with any suggestions to improve the performance. I am sure there are many people who could benefit from a reliable Ubuntu-based print-to-speech system.

Russ