Optical Character Recognition

  • Optical character recognition is the conversion of a scanned document into searchable text and the rendering of its text susceptible to copying for pasting into a new file. Following the scanning of a given document, OCR software evaluates the scanned data for shapes it recognizes as letters or numerals. OCR technology relies upon the quality of the printed copy and the conversion accuracy of the software. Generally acknowledged to be only 80-85 percent accurate.  1
  • A method of translating printed text and images into a form that a computer can manipulate (into ASCII codes, for example). An OCR system enables you to scan a printed document directly into a computer file.  2  3  4
  • A method of scanning printed material and converting it into an electronic file, such as a word-processing file, which can then be searched for specific words or phrases. OCR is distinguishable from “imaging” in that it recognizes only alphanumeric characters and not handwritten or other graphic material.  5
  • Software that, in conjunction with a scanner, is able to “recognize” written text and convert it to an ASCII file or import it into a word processor so may perform one of the full text searches.  6
  • The computer conversion of scanned input images (bar codes or patterns of bits) to computer recognizable codes (ASCII letters, numbers and characters).  7
  • Optical character recognition is a technology which takes data from a paper document and turns it editable text data. The document is first scanned. Then OCR software searches the document for letters, numbers, and other characters.  8
  • When a paper document is scanned into a computer, an image is created. The computer does not recognize the characters of the document as text until OCR software converts the image into text. OCR systems vary widely in the accuracy of their conversion. Even seemingly high accuracy rates can, however, still result in significant numbers of words being misrepresented. A 99% accuracy, for example, would still result in one word out of 20 being misspelled.


