Our goal in digitizing material from the Simon Fraser University Library Editorial Cartoons Collection is to provide access to it, not to preserve it in digital form. The following information explains some of the decisions we made during the planning and implementation of the project.

Descriptive metadata

The records in the database contain the following fields:

  • cartoon key (a string comprised of a cartoonist code and the date on the cartoon, and if necessary, a letter to indicate that there are multiple cartoons by the same cartoonist with the same date on them -- see section on Filenames below for an example)
  • cartoonist
  • subjects (see below for details)
  • date on cartoon (not necessarily the date published, but the date as indicated on the cartoons itself)
  • publication information (where the cartoon was originally published)
  • text (the caption, speech bubbles, signs, and any other significant text within the cartoon transcribed verbatim)
  • physical description (the dimensions of the cartoon itself, not including the gutter around the cartoon; we can add additional information to this field if necessary)
  • cited in (a field that indicates which of the reference books in the SFU Library's collection the cartoon is cited in)
  • processing notes (notes added by the cataloguers, identifying problems with the scanning, or identifying questions about subject headings, etc.)
  • display notes (notes intended to be displayed to the user)
  • SFU MsC Code (the "call number" of the cartoon in the SFU Special Collections)
  • colour (whether the scanned image is black and white, grayscale, or colour)
  • several fields about the metadata record itself, such as whether or not this cartoon is to be displayed on public pages or not, time last updated, who last updated record, etc.

Of these fields, subjects and caption are searchable alone; the user may choose to search "all fields", which includes cartoonist name, caption, subjects, publication information, and display notes. Cartoonist and date can be used to limit searches. In general we followed AACR2 (Anglo-American Cataloguing Rules) guidelines where applicable except for the format of the date; this is in YYYY-MM-DD format to facilitate efficient use by the database and is converted on the fly into other formats for display.

For subject headings we used the Library of Congress Thesaurus for Graphic Materials 1 (TGM 1), which provided a thorough controlled vocabulary and also provided helpful information on the process of cataloguing images. Particularly useful was their discussion of the "of" and "about" aspects of applying subject vocabularies to images. For example, a single cartoon could represent a married couple sitting in a living room (so the cartoon is a picture "of" this scene) while it is "about" something entirely different, such as the Canadian constitutional debate of the early 1980s:

"Debate on the constitution has broken out in Ottawa ... what's new and exciting with you?"

Where necessary, we supplemented TGM 1 with Canadian Subject Headings and some headings from the Library of Congress Subject Headings.

We did not represent TGM 1's syndetic structure. The syndetic structure refers to the thesaurus' representation of related terms, such as norrower terms, broader terms, and terms that are not part of its controlled vocabulary but are included because they are "used for" authorized terms. Also, we did not include MARC subfield codes in the subject headings, even though TGM 1 documents how the parts of subdivided subject headings should be coded so that they are consistent with MARC conventions. At a later date, we will explore developing the subject headings in the database to incorporate these two features.

Since we were not digitizing for preservation purposes, we did not include any metadata about the images (apart from whether they were black and white, grayscale, or color) or scanning procedures, although this information is documented elsewhere.

Digitization procedures


The goal of the first phase of thie project was to digitize and catalogue all of the cartoons in the Library's collection. We simply started at the latest cartoon and worked our way back in time. As of August 2002, we were able to complete 2068 cartoons dating from January 1, 1952 to November 25, 2001. We will continue to add the remaining cartoons as resources become available.


The material in the Cartoons Collection does not include multipage items, which allowed us to use filenaming conventions that are much simpler than the ones we used during the Doukhobor Collection project. The filenames identify each cartoon uniquely, and are based on the "cartoon key" described above, in the format cartoonistID-YYYY-MM-DD. For example, the cartoon key for a work by Len Norris (cartoonist 1 in our database) that had the date January 10, 1963 on it would be 1-1963-01-10. If Norris had labelled more than one cartoon with that date, then the second cartoon would have the key 1-1963-01-10-b. The name for any file pertainig to that cartoon would end in the appropriate file extension, such as tif, gif, or dat (see below for details on the file types).

Several of Bob Krieger's cartoons include preliminary sketches that were rejected by his editors. For these cartoons, we catalogued the published versions and linked the rejected sketches from those records. The filenames for the sketches are identical to those of the published versions, with "-x" appended:

3-1998-07-02.gif    3-1998-07-02-x.gif

(People interested in the fascinating topic of filenames for digitization projects should consult The Technical Advisory Service for Images's guide. There are also periodical discussions of the topic on the ImageLib email list.)

Scanning and file management

The general workflow involved in completing a cartoon was:

  1. Determine the cartoon key
  2. Complete a printed worksheet (indicate the current date and cataloguer's name, assign the SFU MsC Code, transcribe the caption, do the subject cataloguing, make any notes required)
  3. Write the SFU MsC Code on the back of the cartoon in pencil
  4. Scan the cartoon (see below)
  5. Complete the record for that cartoon in the database's cataloguing module

The scanners used for this project were an Epson Expression 836XL or Epson GT-10000+ scanner (two cataloguers were working together during the project). Both of these flatbed scanners have 11x17" platens which allowed us to scan the cartoon itself but in some cases would not have been sufficient to scan the entire piece of artist's board including a border. However, since our goal was to scan for access and not preservation, we decided that we did not want the gutter around the cartoon anyway. A sample of a Norris cartoon that was scanned with a gutter is available. This sample includes a ruler to indicate the actual size of the artist's board. This image was scanned in two halves and then spliced together in Adobe Photoshop Elements.

Most of the cartoons were scanned into black and white TIFF images at 600 dpi (dots per inch). The cataloguers scanned the images into Photoshop Elements and, after saving the original 600 dpi TIFF version using a filename based on the cartoon key, converted the image to grayscale and reduced its pixel density to 72 dpi for display over the Web. This new file was saved as a GIF. In cases where scanning into black and white at 600 dpi did not yield sufficient detail for viewing on a computer monitor, we scanned the cartoon into a grayscale image at 72 dpi. Colour cartoons were scanned in 24-bit colour at 72 dpi and converted for display on the web using Photoshop Elements' "Save for web" function, which produced the most consistent results with the least amount of operator intervention.

At this point, there were two images on the cataloguer's workstation, a TIFF master and a GIF display version. To complete the scanning process, the cataloguer ran a Perl script called "tm" that used parts of the ImageMagick toolkit to create a thumbnail version of the GIF file and also to get some basic information about the image such as its physical dimensions (which could be calculated from the scanning density). The information generated by ImageMagick was written to a small data file named after the cartoon key. Thumbnails were derived at 20% of the fullsized image instead of at an absolute number of pixels, so that they would give the user an idea of the relative size of the cartoons.

Uploading four files onto the database/web server was made easier by having the cataloguing module (also written in Perl) get the files from the cataloguer's workstation instead of having the cataloguer upload the files manually. To achieve this, the workstations had FTP servers running on them, and when the cataloguer chose to create a record for a newly scanned cartoon, the cataloguing module connected to the workstation, transfered all the necessary files to the database/web server, and then using information in the data file, created a record in the database and populated the cartoonist, date on cartoon, physical description, and colour fields. The record also included the thumbnail image with a link to the full size display version, so the cataloguer could verify that these two files were uploaded into the expected locations.

Search and retrieval software

We decided to create our own database for this project, using MySQL as the backend. The search interface and the cataloguing module are based on the Perl CGI scripts developed for the Doukhobor Collection (the original scripts are freely available ). The Cartoons Collection database provides field searching, subject browsing, and limiting by date and cartoonist. It also allows the user to choose to show thumbnail images in the search results and to right-truncate search terms.