Our goal in digitizing material from the Simon
Fraser University Library Editorial Cartoons Collection is to
provide access to it, not to preserve it in digital form. The
following information explains some of the decisions we made during
the planning and implementation of the project.
The records in the database contain the following
- cartoon key (a string comprised of a
cartoonist code and the date on the cartoon, and if necessary,
a letter to indicate that there are multiple cartoons by the
same cartoonist with the same date on them -- see section on
Filenames below for an example)
- subjects (see below for details)
- date on cartoon (not necessarily the
date published, but the date as indicated on the cartoons itself)
- publication information (where the cartoon
was originally published)
- text (the caption, speech bubbles, signs,
and any other significant text within the cartoon transcribed
- physical description (the dimensions
of the cartoon itself, not including the gutter around the cartoon;
we can add additional information to this field if necessary)
- cited in (a field that indicates which
of the reference books in the SFU Library's collection the cartoon
is cited in)
- processing notes (notes added by the
cataloguers, identifying problems with the scanning, or identifying
questions about subject headings, etc.)
- display notes (notes intended to be
displayed to the user)
- SFU MsC Code (the "call number" of the
cartoon in the SFU Special Collections)
- colour (whether the scanned image is
black and white, grayscale, or colour)
- several fields about the metadata record itself,
such as whether or not this cartoon is to be displayed on public
pages or not, time last updated, who last updated record, etc.
Of these fields, subjects and caption are searchable
alone; the user may choose to search "all fields", which
includes cartoonist name, caption, subjects, publication information,
and display notes. Cartoonist and date can be used to limit searches.
In general we followed AACR2 (Anglo-American Cataloguing Rules) guidelines where applicable
except for the format of the date; this is in YYYY-MM-DD format
to facilitate efficient use by the database and is converted on
the fly into other formats for display.
For subject headings we used the Library of Congress
Thesaurus for Graphic
Materials 1 (TGM 1), which provided a thorough controlled
vocabulary and also provided helpful information on the process
of cataloguing images. Particularly useful was their discussion
of the "of"
and "about" aspects of applying subject vocabularies to images.
For example, a single cartoon could represent a married couple
sitting in a living room (so the cartoon is a picture "of" this
scene) while it is "about" something entirely different, such
as the Canadian constitutional debate of the early 1980s:
"Debate on the constitution has broken out in Ottawa ...
what's new and exciting with you?"
Where necessary, we supplemented TGM 1 with Canadian
Subject Headings and some headings from the Library
of Congress Subject Headings.
We did not represent TGM 1's syndetic structure.
The syndetic structure refers to the thesaurus' representation
of related terms, such as norrower terms, broader terms, and terms
that are not part of its controlled vocabulary but are included
because they are "used for" authorized terms. Also,
we did not include MARC
subfield codes in the subject headings, even though TGM 1 documents
how the parts of subdivided subject headings should be coded so
that they are consistent with MARC conventions. At a later date,
we will explore developing the subject headings in the database
to incorporate these two features.
Since we were not digitizing for preservation
purposes, we did not include any metadata about the images (apart
from whether they were black and white, grayscale, or color) or
scanning procedures, although this information is documented elsewhere.
The goal of the first phase of thie project was to
digitize and catalogue all of the cartoons in the Library's collection.
We simply started at the latest cartoon and worked our way back
in time. As of August 2002, we were able to complete 2068 cartoons
dating from January 1, 1952 to November 25, 2001. We will continue
to add the remaining cartoons as resources become available.
The material in the Cartoons Collection does not
include multipage items, which allowed us to use filenaming conventions
that are much simpler than the ones we used during the Doukhobor
Collection project. The filenames identify each cartoon uniquely,
and are based on the "cartoon key" described above,
in the format cartoonistID-YYYY-MM-DD. For example, the cartoon
key for a work by Len Norris (cartoonist 1 in our database) that
had the date January 10, 1963 on it would be 1-1963-01-10. If
Norris had labelled more than one cartoon with that date, then
the second cartoon would have the key 1-1963-01-10-b. The name
for any file pertainig to that cartoon would end in the appropriate
file extension, such as tif, gif, or dat (see below for details
on the file types).
Several of Bob Krieger's cartoons include preliminary
sketches that were rejected by his editors. For these cartoons,
we catalogued the published versions and linked the rejected sketches
from those records. The filenames for the sketches are identical
to those of the published versions, with "-x" appended:
(People interested in the fascinating topic of
filenames for digitization projects should consult The Technical
Advisory Service for Images's guide.
There are also periodical discussions of the topic on the ImageLib
Scanning and file management
The general workflow involved in completing a
- Determine the cartoon key
- Complete a printed worksheet (indicate the
current date and cataloguer's name, assign the SFU MsC Code,
transcribe the caption, do the subject cataloguing, make any
- Write the SFU MsC Code on the back of the cartoon
- Scan the cartoon (see below)
- Complete the record for that cartoon in the
database's cataloguing module
The scanners used for this project were an Epson
Expression 836XL or Epson GT-10000+ scanner (two cataloguers were
working together during the project). Both of these flatbed scanners
have 11x17" platens which allowed us to scan the cartoon
itself but in some cases would not have been sufficient to scan
the entire piece of artist's board including a border. However,
since our goal was to scan for access and not preservation, we
decided that we did not want the gutter around the cartoon anyway.
A sample of a Norris cartoon that was scanned
with a gutter is available. This sample includes a ruler to
indicate the actual size of the artist's board. This image was
scanned in two halves and then spliced together in Adobe Photoshop
Most of the cartoons were scanned into black and
white TIFF images at 600 dpi (dots per inch). The cataloguers
scanned the images into Photoshop Elements and, after saving the
original 600 dpi TIFF version using a filename based on the cartoon
key, converted the image to grayscale and reduced its pixel density
to 72 dpi for display over the Web. This new file was saved as
a GIF. In cases where scanning into black and white at 600 dpi
did not yield sufficient detail for viewing on a computer monitor,
we scanned the cartoon into a grayscale image at 72 dpi. Colour
cartoons were scanned in 24-bit colour at 72 dpi and converted
for display on the web using Photoshop Elements' "Save for web"
function, which produced the most consistent results with the
least amount of operator intervention.
At this point, there were two images on the cataloguer's
workstation, a TIFF master and a GIF display version. To complete
the scanning process, the cataloguer ran a Perl script called
that used parts of the ImageMagick
toolkit to create a thumbnail version of the GIF file and also
to get some basic information about the image such as its physical
dimensions (which could be calculated from the scanning density).
The information generated by ImageMagick was written to a small
data file named after the cartoon key. Thumbnails were derived
at 20% of the fullsized image instead of at an absolute number
of pixels, so that they would give the user an idea of the relative
size of the cartoons.
Uploading four files onto the database/web server
was made easier by having the cataloguing module (also written
in Perl) get the files from the cataloguer's workstation instead
of having the cataloguer upload the files manually. To achieve
this, the workstations had FTP servers running on them, and when
the cataloguer chose to create a record for a newly scanned cartoon,
the cataloguing module connected to the workstation, transfered
all the necessary files to the database/web server, and then using
information in the data file, created a record in the database
and populated the cartoonist, date on cartoon, physical description,
and colour fields. The record also included the thumbnail image
with a link to the full size display version, so the cataloguer
could verify that these two files were uploaded into the expected
Search and retrieval software
We decided to create our own database for this project,
using MySQL as the backend.
The search interface and the cataloguing module are based on the
Perl CGI scripts developed for the
Doukhobor Collection (the original scripts are
freely available ). The
Cartoons Collection database provides field searching, subject
browsing, and limiting by date and cartoonist. It also allows the
user to choose to show thumbnail images in the search results and
to right-truncate search terms.