SLA 2015 part 1: Classifying and keywording image collections

Boston from Castle Island

Boston from Castle Island

This is the first of a series of posts about my experience at the 2015 Special Libraries Association Annual Conference in Boston. Though I was very busy with my duties as board secretary for the association, I found time to attend more sessions than I anticipated as well as time to talk with friends and colleagues. All of the sessions I attended were good, several were outstanding, and all in all it was one of the best conferences I’ve been to.

In this session, “Get the Picture: Use your Taxonomy to Classify Images,” the two speakers talked about the unique challenge of finding images in two very different kinds of collections. Because images themselves do not have any text, classification is especially important for retrieval in collections of images. The speakers’ use of specific examples to illustrate the challenges associated with classifying images and strategies to overcome them was compelling, and I came away from the session with ideas I can apply to classification challenges in my own library, even beyond image collections.

First, Ann Pool from Corbis talked about a commercial photography collection. The items in the collection come from photographers with their contributed keywords. These keywords are then translated into a controlled vocabulary, which is in turn translated into searchable terms and then into nine different languages. All this is managed using an in-house-developed taxonomy tool.

Pool described the strategies used to improve image retrieval, including manipulation of the contributed keywords and search functionality like keyword autocompletion and navigation filters.

One of the main challenges Pool discussed was over-keywording by photographers. She identified the reasons for this: thinking more is better, using batch keywording, and using keywords to provide background for the image. For example, “skyline” and “Boston” would be good keywords for the image included in this post. I might also add “clouds”—but would someone searching for clouds think this photo is relevant? I might add “Castle Island” or “Fort Independence” because that’s where I took the photograph, but neither of those places is in the image. The photographers think they are providing useful information but users end up frustrated and not buying images from the collection.

Pool uses a variety of strategies to improve image keywording. She communicates with the photographers about best practices, she batch removes overused keywords, and she sometimes rejects submissions and sends them back to the contributor with feedback. The relevancy rule Pool uses is that the keyword only applies if people searching for the keyword would want the image in their results.

Pool also uses crowdsourcing (through Mechanical Turk) to improve keywords. Projects include checking for relevancy and counting the number of people in images.

Next, Joy M. Banks, a library and archives consultant, discussed describing historical images for access. In the Bok Tower Gardens project she described, she started with an existing vocabulary and used contentDM.

In planning a strategy for describing images in a collection, Banks said, it is important to think about how the collection will be discovered and used. Who are your users? Do you have multiple audiences? Will users search or browse? Are you planning for unexpected users? (Can you afford to? Can you afford not to?)

Banks stressed the importance of user input. For example, users of a collection of citrus labels were able to point out that the color of the label indicates the grade of the fruit. If people involved in creating a collection are still living, seek them out and talk to them.

Many image collections can be used in unexpected ways, Banks noted, but a key to this is having your collection show up in search results. When thinking about pushing a collection out to a larger database or search engine (e.g., OCLC Digital Collections Gateway), you should consider adding keywords that apply to the entire collection.

Thinking about collections at the American Philatelic Research Library, for example, we don’t typically use keywords like “postage stamps” or “mail” that would apply to almost every item in our collection. However, in the context of a larger collection, these keywords would help users find images.

The lessons learned in this session can be applied well beyond image collections. For example, in the APRL’s catalog, the subject heading “postal history” (a specific term describing the collection and study of intact items that have traveled through the mail) has been overused to the point that it is often not useful as a search term. When they search, library staff often include the Cutter number we use for postal history in our call numbers (P860) to get more relevant results.

In another example, I recently imported an index for the Meter Stamp Society Bulletin into a larger article index database. I batch added the subject heading “metered mail”—assumed in the single-title index—to all the records to improve retrieval.