So what is optical character recognition (OCR)? In the simplest possible terms, OCR is a digital software technology which enables you to instantly convert scanned printed text into digital text. In essence, rather than creating an exact static image copy of a document, you instead create a digital document file that can be edited and searched as required.
To understand the way that optical character recognition works, we should first consider what usually happens when we scan a document. Once you place your document on the scanner and hit scan, it is transformed into a digital image, which is more or less an exact copy. You can read it, store it and reprint it, but because your computer does not ‘recognise’ any of the lines or squiggles on the page as text, that’s about all you can do.
By using OCR software we make it possible for the computer to recognise the text as individual characters, whether letters or numbers. That information can then be used to automatically create a copy of the document which can be edited or searched, just as with a document that was digitally created.
What are the benefits of OCR?
For any organisation that deals with a large number of printed or handwritten documents on a daily basis, there are many good reasons to use optical character recognition software.
Firstly, it gives you the ability to search through a document or collection of documents for specific words instantly, rather than painstakingly doing a visual search, page by page and document by document. This can not only save your organisation a substantial amount of time, but also opens up possibilities that did not previously exist, in terms of extracting information and data from documents.
Then there’s the ability to actually edit and add to the text, just as you would with any other digitally created document. All you need to do is scan the original document using optical character recognition software, and then open the resulting file in your preferred word processing software.
There are also other benefits that certain types of organisations and people may benefit from. For instance, it is possible to digitise documents and, in conjunction with text to speech software, turn them into audio files for vision impaired people. There’s also the added benefit of being able to instantly recreate a lost digital file from your hard copy, without the need for retyping.
Of course, not all OCR software is created equal, and some may suffer from frequent errors in the text, limiting their usability. That’s why it’s important to use professional software packages with high quality OCR capabilities such as Nuance’s Power PDF and OmniPage.