skip navigation

The Ohio State University

  1. Help
  2. Campus map
  3. Find people
  4. Search OSU


Web Accessibility Center home page.

  • Web Accessibility Center



Text-only version.

Creating Accessible PDF from Scanned Documents

WAC Workshop. March 2005. Written and Presented by: Lori Bailey.

Table of Contents:

Scanned PDF and Accessibility

  1. Options for Scanning to PDF
  2. How to Scan to PDF
  3. Scanning Basics
  4. Creating PDF: TIFF to PDF using Acrobat 6 Professional
  5. Creating PDF: Directly using Acrobat 6 Professional

Editing Your PDF for Accessibility

  1. Steps for An Accessible Scanned PDF Document
  2. Performing OCR Using Acrobat 6 Professional.
  3. Cleaning Up Acrobat OCR.
  4. Verifying Your Document Text
  5. Adding Tags to Your Document
  6. Checking Your Document for Accessibility

Resources

  1. Guides and Tutorials
  2. Software

Required Software.

  • Acrobat 6.0 Professional or higher with Accessibility Checker.
  • Scanner with highest resolution (DPI) available.
  • Alternate: use one of the commercially available software converters

Scanned PDF and Accessibility

Scanned PDF documents represent the most nefarious type of document in terms of accessibility to users of assistive technology. Why? Because, in most cases, a document scanned directly to PDF or scanned and then converted to PDF will be transferred as a large image file. Each page will contain one large image with all text, tables, images, and graphics grouped into that image. Text on the page is not searchable or selectable. To the assistive technology user, the document appears completely blank.

To make a scanned document accessible, you must convert the image of the document into "real" text. That is, the text must be selectable and scalable. This is usually done through OCR (Optical Character Recognition). If the PDF version is also to be your accessible version, you'll need to add additional accessibility mark-up adding "tags" to your PDF, adding alternative text for images, graphs, and charts, and adding header information to data tables. In addition, text created from a scanned image of a document is often converted into unexpected segments and these segments may be out-of-order in terms of the expected read-order of the document. You'll need to perform several checks to insure correct read-order is established, once your document is converted.

Options for Scanning to PDF

Basically, you have two choices for creating your PDF document from paper. You can scan your document into an image file (typically a TIFF) and then convert the image file into a PDF. Or you can scan directly into PDF using the "Create PDF from Scanner" option in Acrobat; using your scanner's PDF conversion option; or using commercially available conversion software.

PDF experts tend to suggest using the first option scanning to a TIFF and then importing into Acrobat or your PDF creation software. By separating the steps, you can focus first on creating clean, high-quality scans of the document and then worry about converting to an accessible PDF. If you process your documents directly to PDF, you may need to do several rescans at different DPI and different settings, before you have a PDF that can be successfully manipulated. However, once your settings have been established, we found little difference between creating a TIFF and scanning directly to PDF.

Regardless of how you scan your document, you will need to do some follow-up after the PDF version has been created to add accessible features. This can be a very simple process, for simple documents, or a very lengthy and complex process, for complex documents, and much depends on what software you have available.

How to Scan to PDF

Each scanner is different and uses different software, different defaults, and different preset configurations. You'll probably need to experiment to find out which settings work best for the types of documents you are scanning and converting. In the examples below, we used an Epson Perfection 1660 Photo scanner and customized the settings to 400DPI Black and White Photo output.

Scanning Basics

  1. Place your document on the scanner bed. Be sure it is as straight as possible.
  2. Press the scan button on the scanner front or open your scanner software and choose "acquire image"
  3. Adjust the settings of your scan software to optimize your scan. We suggest scanning in black and white, unless you need to maintain color. Black and white documents typically have more success in OCR, because color can shade text and cause read errors. Adobe Acrobat can only perform OCR on documents scanned at 200-600 DPI. We found 400 DPI gave us the best OCR conversion with our sample documents.
  4. Save your scanned document as an image, preferably a TIFF file (very large, but maintains highest quality graphics.

Creating PDF: TIFF to PDF using Acrobat 6 Professional

  1. from the FILE menu, choose CREATE PDF and FROM FILE.
  2. Navigate to your TIFF file created from the scanner and choose OPEN.
  3. Save your new PDF document and edit for accessibility.

Creating PDF: Directly using Acrobat 6 Professional

  1. Place your document on the scanner bed. Be sure it is as straight as possible.
  2. Open Adobe Acrobat.
  3. From the FILE menu, choose CREATE PDF and FROM SCANNER. The "Create PDF from Scanner" dialog box appears.
    screen shot: "Create PDF from Scanner" dialog box
  4. Make sure your scanner is selected and slide the quality selector toward the "Higher Quality" end.
  5. Click SCAN.
  6. Your scanning software may open and allow you to change the scan settings, or this may be automated.
  7. Save your PDF document and edit for accessibility.

Editing Your PDF for Accessibility

Steps for An Accessible Scanned PDF Document

In order to insure your document is accessible to users of assistive technology, you'll need to edit the PDF document:

  1. Perform OCR (Optical Character Recognition) on the document to make text selectable and searchable. Repair any problems found during OCR conversion.
  2. Add descriptive tags for non-text elements: graphs, charts, images.
  3. Add accessible mark-up for tables.
  4. Verify the read-order of the document.

Performing OCR Using Acrobat 6 Professional.

  1. Open your scanned PDF document.
  2. From the DOCUMENT menu, choose PAPER CAPTURE and START CAPTURE.
  3. Acrobat interpolates your document and tries to produce readable text.
  4. Save your document.

OCR Tip: After performing OCR, switch to Select Text mode and try to select text in your document. The text that is highlighted has been interpreted by Acrobat. Any text that cannot be highlighted failed to be converted. Also, notice if text is highlighted in an odd order or if some blocks of text are skipped. This indicates problems with read order.

Cleaning Up Acrobat OCR.

As Acrobat performs its OCR process, it creates a list of "suspect" words and characters that could not be clearly identified. You can see all the suspect items at once: from the DOCUMENT menu, choose PAPER CAPTURE and FIND ALL OCR SUSPECTS. Acrobat highlights all the suspect items in the document.

You must address each OCR suspect. Any OCR suspect that you ignore will not be converted into readable text and will be ignored by screen readers.

You can walk through the OCR suspects one by one:

  1. from the DOCUMENT menu, choose PAPER CAPTURE, and FIND FIRST OCR SUSPECT.
  2. The FIND ELEMENT dialog box appears showing the first "suspect" set of characters.
    screen shot showing suspect item
    If the suspect characters are text, you'll be able to edit them in the dialog box. Otherwise, retype the correct text characters directly in the document using advanced editing techniques in Acrobat.
  3. Once you have corrected the suspect, choose "Accept and Find" to go to the next suspect item.
  4. When you have corrected all suspect items, save your document.

Verifying Your Document Text

After you have performed OCR and addressed all the suspect characters, you can do a quick check to insure that the text of your document is available to screen readers: Save as text (accessible).

  1. From the FILE menu, choose "SAVE AS"
  2. In the "SAVE AS" dialog box, change the "SAVE AS TYPE" to "Text(Accessible)(*.txt)"
    Screen shot: "Save as" dialog box.
  3. Click SAVE. Adobe converts your document to a plain text file using the same text that would be accessible to assistive technology, including alternative text for images and graphics.
  4. Open your newly saved text version in Adobe. Compare the text in the plain text version to the text in the PDF version are they the same? If not, edit the text and/or edit the tags in the PDF version and re-save as "Text(Accessible)" to check again.

Adding Tags to Your Document

Once you are certain that the necessary text is available on the document, you can add tags to your document. Adding tags creates a duplicate of your document that is marked-up for accessibility. Only the very latest assistive technology can read an untagged PDF. Plus, untagged PDF cannot be reflowed to fit available screen size and cannot contain additional information, such as alternative text for images. Thus, only a tagged PDF can be considered accessible.

You can use Acrobat's automated feature to add tags to your document:

  1. From the ADVANCED menu, choose ACCESSIBILITY and "ADD TAGS TO DOCUMENT".
  2. Acrobat generates a tagged version of your document that can only be viewed in the tags window. To open the tags window:
    • From the VIEW menu, choose NAVIGATION TABS and TAGS.
    • Use the asterisk (*) key on the Number Key Pad to open all tag levels.
    • Use the minus (-) key on the Number Key Pad to close all tag levels.
  3. Check tags for accuracy, completeness, and read-order.

Checking Your Document for Accessibility

After adding tags, you can do a few quick-checks to insure your document will work well with assistive technology. You can also use these techniques at any point in your conversion process to check the accessibility of your document.

Highlight content

Highlighting content is a simple method to confirm:

  1. Text is readable by a screen reader. Text that cannot be highlighted/selected is likely to be skipped or ignored by screen readers. Perform another OCR and confirm that deselected text is not a "suspect character".
  2. Read-order of the document. The order that text is highlighted/selected is also the order the text will be read by a screen reader. Pay particular attention to text in tables or columns. Does the text in one cell bleed into the text in another? Can you select all of one column and then all of the next? Read-order can be changed by rearranging the tags.

Reflow

Document reflow assists users who enlarge the text or who are using small screens or resolutions, by reformatting the document to fit in the available screen. Without reflow, users may be forced to scroll widely horizontal as well as vertically.

To check for reflow:

  1. Increase the text size to 300% or greater.
  2. From the VIEW menu, choose REFLOW
  3. Note that how a document reflows also depends on read-order.

Read Aloud

The best way to check a document's accessibility is to use the same assistive technology your users will use to access the document. However, if you don't have access to a screen reader or screen enlarger, you can still get a sense of how those technologies will interpret your document by listening to it being read by Acrobat's "Read Out Loud" feature. Although not practical for lengthy documents, such as dissertation chapters or articles, this is a good strategy for shorter documents that will receive high circulation on your web site or will be required reading for your users.

To read out loud:

  1. From the VIEW menu, choose READ OUT LOUD
  2. Press SHIFT + CNTRL + V to quickly read the current page
  3. Press SHIFT + CNTRL + B to read the entire document
  4. To stop reading: go to the VIEW menu, READ OUT LOUD, and choose STOP or press SHIFT + CNTRL + E.

For longer documents, you may want to narrow your reading to only a few key pages: in particular, those pages that contain graphics, tables, columns, or text boxes.

Editing Tags

Any problems you find during your checks will most likely need to be addressed by editing the tagged version of your pdf. For detailed guidance on how to edit tags and markup images, tables, and links for accessibility, see the WAC Handout: "Checking Your PDF for Accessibility". It is available online at: www.wac.ohio-state.edu/pdf/checking.

Resources

Guides and Tutorials

The WAC has put together an extensive collection of guides and resources on various production methods for accessible PDF: see more in the WAC PDF Tutorial section.

Adobe offers a number of excellent resources as well. One we recommend: Acrobat for Educators which includes a selection of FREE online video tutorials that guide you through how to use Acrobat from simple Bookmarks and Articles to advanced Document Collections.

Want more? Check out the discussions, tips, and tools offered by Planet PDF, a community of advanced developers ready to help you with quick solutions to your PDF problems. Includes a very useful collection of software titles for all types of PDF creation and conversion.

Software

A number of companies offer software that specializes in converting PDF to either accessible (selectable & searchable) PDF or to other, more accessible, formats (Word, Excel, etc.). Here are a few:

ABBYY PDF Transformer ($49.99): Quickly and accurately convert any PDF file into Microsoft® Word, Excel or HTML files without retyping and reformatting. PDF Transformer is an ideal utility for business and home users that need to edit and repurpose a wide variety of PDF files. [http://www.abbyyusa.com/pdftransformer.htm]

Able2Extract Professional ($120): Convert your PDF data into fully formatted Excel spreadsheets and editable Word documents with Able2Extract Professional. Supports scanned documents, offering 10 different conversion options in total. [http://www.investintech.com/prod_a2e_pro.htm]

Adobe Acrobat Capture ($195): Adobe Acrobat® Capture® 3.0 software is the perfect addition to Adobe Acrobat 7.0 for people who want to process high volumes of scanned paper and turn them into searchable tagged Adobe PDF files. [http://store.adobe.com/enterprise/accessibility/acrobatcapture30.html]

ISICopy ($99): ISICopy works with Adobe® Acrobat® software to extract text from image-based PDF files, converting it into valuable editable text. There is no need to OCR an entire page; if you have a paper-based PDF file, you can select the precise amount of text you want to copy and then paste it into any application.

ScanSoft OmniPage Pro ($120): Quickly turn paper and PDF files into editable electronic documents that look just like the original complete with text, tables and graphics. Robust new tools enable you to turn text documents into audio books and add digital signatures to your electronic documents. [http://www.scansoft.com]

SolidConverter ($50): You do NOT need Adobe® Acrobat® or Reader® to use our converter! Solid Converter PDF can be used as a standalone converter tool or as a plugin for Microsoft Word® and Adobe® Acrobat® (not Reader). Solid Converter PDF is also available through Explorer's right click local menu. A command line interface is available for batch processing. [http://www.solidpdf.com/]

For large jobs:

PrimeOCR ($1500, limited # of pages): includes an "Accessible PDF" module that meets Section 508 guidelines for an accessible document. [http://primerecognition.com/augprime/prime_ocr.htm]

Note: prices are offered for reference only (subject to change) and do not include any educator's or volume-licensing discounts, if applicable. Before ordering Adobe products, find out if it is available through our OSU volume-license agreement with SHI. See "Adobe Ordering Procedures" on the OIT Site Licensed Software page (available to OSU faculty and staff only): [https://cweb1.net.ohiostate.edu/software/lookup.cgi?adobeclp&1.0&win&Adobeorder.pdf]

OSU Web Accessibility Center (WAC)
1760 Neil Ave 150 Pomerene Hall Columbus, Ohio 43210
Phone: (614) 292-1760 Fax: (614) 292-4190 E-mail: webaccess@osu.edu
For questions or problems with this site, including incompatibility with assistive technology, email the WAC Webmaster.

 

 

Our Partners