Prime Recognition Logo -  High Accuracy Optical Character Recognition
Home
Products
Services
Support
Customers
Partners
News
Search

Why High Accuracy
Why PrimeOCR
Try PrimeOCR
Info via Email
Mailing List
Contact Us

 

 

.

PDF Conversion Details

Overview

Prime Recognition software includes the capability to convert scanned images into PDF formatted files. Several products from Prime Recognition support PDF output, including PrimeOCR, an award winning, high accuracy "Voting" OCR engine, PrimeZone (image to PDF only), and PrimePost (PRO to PDF).

Supports Adobe Acrobat

PrimeOCR's PDF output provides the most accurate OCR results available to the production imaging marketplace while minimizing PDF file size with full compression and retaining original image and text layout.

Three styles of PDF documents can be produced:

  • PDF Image Only

    documents contain a bitmap image of the original scanned document. Text is not included in this type of document.

  • PDF Normal

    documents include the formatted text output from the PrimeOCR engine, and image zones, if any. These files are significantly smaller than the original compressed bitmap image files.

  • PDF Image with Hidden Text

    includes information from both the PDF Image and the PDF Normal file types. The original bitmap image is included in the document while the OCR results are hidden behind the image. This type of document is useful when the original image needs to be retained while OCR results can be indexed, searched, or copied into another application.

Advantages of using PrimeOCR for PDF Creation

OCR Accuracy

  • PrimeOCR generates 50-80% fewer character recognition errors than other OCR engines.

Designed for high volume unattended production environments

  • Memory management for robust operation. Many of today's products that produce PDF files have limitations processing a large number of documents in batch mode, or handling multi-page TIFFs. Prime Recognition products manage memory effectively so thousands of images and multi-page TIFFs can be processed quickly without complications.

  • Capability to process batches of images in directories and subdirectories, facilitating hands off operations of large imaging jobs.

  • Fault tolerance and process logging. Image/OCR errors are captured and recorded in log files and processing continues automatically. The software is designed for robust, continuous operation.

  • Support for long filenames and NTFS compressed drives. Prime Recognition offers the latest in Windows NT/2000/XP compatibility.

  • Automatic zoning within OCR, or automatic zoning with manual QA, or manual zoning before OCR are supported.

  • Image enhancement may be controlled by the user and may be done in a separate step from OCR or within OCR process including deskew, auto-rotation, despeckle, etc.

Speed

  • The single engine (Level 1) version of PrimeOCR is over 85% faster than other production imaging solutions.

  • The very high accuracy five engine (Level 6) version of PrimeOCR is at least 15% faster than alternatives.

Process Time

OCR Conversion of 21 TIFF Images to PDF Image Plus Text

OCR Process

Time (min)

% faster with PrimeOCR

Other product

7:30

n/a

PrimeOCR Level 1

1:05

85%

PrimeOCR Level 3

3:00

55%

PrimeOCR Level 5

5:50

15%

  • Conversion of TIFF images to the PDF Image Only document format is 92% faster than alternatives.

Process Time

Conversion of 21 TIFF Images to PDF Image Only

Process

Time (sec)

% faster with PrimeOCR

Other product

80

n/a

PrimeOCR

6

92%

File Size

  • Prime Recognition's PDF output can save up to 80% disk space vs. other alternatives depending on the PDF file type.

File Size

Conversion of 21 TIFF Images (876.6KB total size)

PDF File Type

PrimeOCR
(File Size KB)

Other Product
(File Size KB)

% saving with PrimeOCR

Normal

117.0

620.1

80%

Image Only

926.0

1263.0

25%

Image plus hidden text

988.0

1560.5

35%

  • All fonts are mapped to the base fonts found in the PDF reader reducing file size (however "look and feel" of document in PDF Normal format may suffer when the base fonts do not closely match fonts in document).

  • Both text and images are compressed within the PDF file to minimize file size.

  • To further minimize file size, desampling of the images within a PDF file is available with PrimeOCR PDF output. Desampling is fully configurable by the user from 50 dpi to 600 dpi.

PrimeOCR PDF I/O Specifications

Input File Formats:

  • TIFF - including large multi-page (>1,000's of pages) files

  • PCX

  • Bitonal images, color and grayscale

  • JPEG

  • PDF

Output File Formats:

  • PDF Image Only

  • PDF Normal

  • PDF Image with Hidden Text

  • Color and grayscale output supported

  • Optimized PDF output available with Acrobat installed.

Home  -  Products  -  Services   -  Support  -  Customers   -  Partners  -  News   -  Search
Why High Accuracy  -  Why PrimeOCR  -  Try PrimeOCR  -  Info via E-mail  -  Join Mail List  -  Contact Us