Many libraries have found they can save space and make information more available by scanning their vast holdings of journals and conference proceedings, converting the result into PDF formated files. That’s good news for people doing research on-line. But here is a problem:

This particular journal article contained a photograph, but when journals are quickly scanned and compressed, the result is the almost complete destruction of the image data. This is actually a pretty good example. It’s not unusual for photos in a paper to be reduced to a black rectangle with some white blobs.

Here are a couple more examples. This is bad news, since as Adobe’s advertising campaign says, libraries can “Pitch the Paper!”. Let us hope that we do not find one day that some important historical images are completely lost in this process.

Of course, PDF is capable of storing images at higher resolution. But it takes time and care to scan a paper well enough to preserve a good record of a photograph. In practice, it just doesn’t happen — thousands of journals have to be processed, using default settings that compress the data well but ruin the images.

Personally, I wish scientists would use rich text format instead of PDF. I’d like to be able to read a science paper, click on a graph and see a spreadsheet of the original data pop up, as it does with MS Word. But the ability to embed original data and images in a paper is rarely used.

Advertisements