CDMR.com

	a
	Development Services Overview
	Optical Character Recognition (OCR)
	Document Clean-Up Facility
	Electronic Fax Integration
	GIS Integration
	MS Exchange / Outlook Integration
	Vector View Plug-Ins
	Adobe for Document Preparation

Optical Character Recognition (OCR)

CDMR incorporates only the latest state-of-the-art OCR engines into the DocControl system. Depending on individual customer requirements, the OCR engine integrated with DocControl provide high accuracy levels, artificial intelligence and other specific OCR features such as zoning, text verification and visual references. Although the OCR feature is extremely useful for some organizations, it may be absolutely redundant for others. That is why CDMR has made its OCR module an optional extension. Our consultants will be able to advise you if your DMS requires the OCR feature and how best to take advantage of it.

These are some of the common OCR Module features:

High recognition accuracy
Artificial intelligence
Built-in and user-defined lexicons
Text verification and correction during the recognition process
Zone designation
Different fonts and font sizes
Pre-processing for fax, dot matrix and degraded documents to improve recognition results
Recognized text can be exported in many different formats such as MS Word & MS Excel
Process documents in two-page mode for open-faced books and magazines
Many more OCR properties and functions …

What is OCR?

OCR is the process of converting scanned images of machine printed or handwritten text (numerals, letters, and symbols), into a computer processable format (such as ASCII). The character recognition process from a document image can be divided into three operational steps - document analysis, character recognition and contextual processing.

What is OCR used for?

Commercial OCR systems can largely be grouped into two categories - general purpose page readers and task-specific readers.

General purpose page readers are designed to handle a broader range of documents such as business letters, technical reportss and newspapers. These systems capture an image of a document page and separate the page into text regions and non-text regions. Non-text regions such as graphics and line drawings are often saved separately from the text and associated recognition results. Text regions are segmented into lines, words, and characters and the characters are passed to the recognizer. Recognition results are output in a format that can be post processed by application software.

A task-specific reader handles only specific document types such as pre-specified forms, cheques or survey letters. These readers usually capture only a few predefined document regions. Such systems emphasize high throughput rates and low error rates for its region.