What’s next:
In the Labs

×

What is optical character recognition (OCR)?

Most businesses are aware of document scanning and have integrated it into their working processes. But despite this shift that has transformed paper recordkeeping and file transmission into a digital medium, many individuals and businesses remain unaware of one of the most powerful features of document scanning – optical character recognition.

By
Woman at a scanner creating a pdf file with the help of optical character recognition

So what is optical character recognition (OCR)? In the simplest possible terms, OCR is a digital software technology which enables you to instantly convert scanned printed text into digital text. In essence, rather than creating an exact static image copy of a document, you instead create a digital document file that can be edited and searched as required.

To understand the way that optical character recognition works, we should first consider what usually happens when we scan a document. Once you place your document on the scanner and hit scan, it is transformed into a digital image, which is more or less an exact copy. You can read it, store it and reprint it, but because your computer does not ‘recognise’ any of the lines or squiggles on the page as text, that’s about all you can do.

By using OCR software we make it possible for the computer to recognise the text as individual characters, whether letters or numbers. That information can then be used to automatically create a copy of the document which can be edited or searched, just as with a document that was digitally created.

What are the benefits of OCR?

For any organisation that deals with a large number of printed or handwritten documents on a daily basis, there are many good reasons to use optical character recognition software.

Firstly, it gives you the ability to search through a document or collection of documents for specific words instantly, rather than painstakingly doing a visual search, page by page and document by document. This can not only save your organisation a substantial amount of time, but also opens up possibilities that did not previously exist, in terms of extracting information and data from documents.

Then there’s the ability to actually edit and add to the text, just as you would with any other digitally created document. All you need to do is scan the original document using optical character recognition software, and then open the resulting file in your preferred word processing software.

There are also other benefits that certain types of organisations and people may benefit from. For instance, it is possible to digitise documents and, in conjunction with text to speech software, turn them into audio files for vision impaired people. There’s also the added benefit of being able to instantly recreate a lost digital file from your hard copy, without the need for retyping.

Of course, not all OCR software is created equal, and some may suffer from frequent errors in the text, limiting their usability. That’s why it’s important to use professional software packages with high quality OCR capabilities such as Nuance’s Power PDF and OmniPage.

Read full article

More from the editor

Providing easy to use speech tech helps usher forth tomorrow’s interactive appliances
Machine learning and AI turn big data into big knowledge for a better customer experience
Unique application of Neural Nets results in greater productivity
Simple isn’t always as simple as it seems
DFKI students use nuance speech tools to create interactive IoT applications
Providing easy to use speech tech helps usher forth tomorrow’s interactive appliances
technology can transcribe meetings between colleagues
Simple isn’t always as simple as it seems
call-center-customer-support-helps-virtual-agent
Machine learning and AI turn big data into big knowledge for a better customer experience
Dragon uses deep learning for more accurate speech recognition.
Unique application of Neural Nets results in greater productivity
Show more articles