 
Text content
|
|
|
|
|
|
|
|
 |
OCR - optical character recognition
Optical character recognition software can convert a scanned image of a document
into digital text, which may then be edited and reformatted. Most flatbed scanners
have OCR software bundled when you purchase them; this software may or may not
be adequate for your needs. If you will be doing very much OCR, it is worth investing
in good quality software (such as Nuance
Omnipage) it will save many hours of making corrections to scanned
documents.
There are advantages and disadvantages to using OCR for document conversion:
Advantages:
- A document that was inaccessible to screen-readers can be made accessible.
- The text in the document may now be searched, cut/pasted etc.
- The file will be considerably more compact for download than an image file.
Disadvantages:
- Page formatting is lost.
- Graphics or images will need to be re-inserted.
- The document may need considerable reformatting to be suitable for web use.
- You will need to review the document for typographical errors generated by
the OCR software.
It must be kept in mind that a document scanned and converted with OCR will
probably require considerable editing and restructuring to be suitable for web
delivery. It may be appropriate to provide several versions of the document:
- Edited and simplified for reading from screen
- Full-text OCR for accessibility and research
- PDF with original formatting for printing and annotating
It is possible to batch OCR pages of documents with an automatic
document feeder UNSW Publishing & Printing Services has this facility:
http://www.publications.unsw.edu.au/, go to Services > Scanning/OCR.
Not that copyright legislation applies to supplying digital material on the
web. See Copyright issues.
> copyright issues
|