Indexing Online: The New Face of an Old Art
Hyperviews Online Summer 1999 Feature
Summer 1999
Volume 2, # 3

B Y  S E T H   A.  M A I S L I N
Boston Chapter STC

What happens to a book index when you remove the page numbers? It's called "indexing online."

The value of pages The fastest way to destroy an index is to destroy its page numbers, and sure enough, the World Wide Web doesn't have page numbers. Even more, embedded indexing techniques don't require the indexer's awareness of page numbers.

Consider a single entry from a hypothetical back-of-the-book index:

Telegraph communications, 12, 44-56, 209
There is a lot of information here. We know that the bulk of information about telegraph communications starts on page 44. We can assume that the entry on page 12 is an introductory comment. We also assume that the entry on page 209 is a casual mention, since it is so completely isolated from the other page numbers. If the book had 215 pages, we might further assume that the last entry points to the glossary or an appendix.

Online, the entry would feel a lot like this:

Telegraph communications, *, *, *
We no longer know which entry is the most important. We no longer understand how these documents fit into the entire collection of documents. We do not know the lengths of the documents themselves. We cannot even determine the order of the entries. Are they listed by importance, alphabetically by filename, or chronologically? And are they even at the same web site?

The loss of page numbers and of global context is the fundamental handicap to writing a good index. Even the best indexing programs cannot work around these obstacles because they are side-effects of the essential nature of online presentations. It is the indexer's responsibility to make accommodations for the environment.

Lost knowledge Let's invent a baseline index. Consider this index, which might appear at the back of a book:
Telegraph communications, 137-153
   Dots and dashes, 385
   Electricity and its uses, iii-iv
   History of early distance communication, 5-9, 6i
   The telegraph wire, 137-140
Online, our page numbers become hyperlinks (represented here by underscoring):
Telegraph communications
   Dots and dashes
   Electricity and its uses
   History of early distance communication
   The telegraph wire
Notice how much information has been lost. Fortunately for us, these headings are descriptive enough so that we can confidently guess what they mean. Not all authors are as understanding of the user's needs, however. Consider this nightmare of "clever" headings:
Operator, I'd Like to Send a Message
   Beep Beep Beep!
   The Power of a Hot Wire
   From Shouting to Carrier Pigeon
As indexers, it is our responsibility to do better.
Rating context The web indexer can rate entries. You could, for example, display in boldface type any entry that meets a predefined threshold of importance. This is a binary rating system, in that the entry is either bold or not bold. Binary rating systems work poorly with subjective thresholds like "importance" or "relevance," because the reader usually does not understand the threshold. Continuous rating systems, such as type size (bigger equals more important) or color (red is more important than pink, for example), present another dilemma: ugliness. An index with a cornucopia of line sizes or colors is unprofessional. In fact, the only real advantage to the binary and continuous rating systems is that they are both easy to implement.

A discrete rating system with three to five levels is better, similar to how movies are rated in newspapers. The most intuitive of these is the "asterisk rating system," where each additional "star" represents slightly greater importance or relevance. The stars can be added to the beginnings or ends of the locators. Although this method can make an index look sloppy, no explanation is needed:

Telegraph communications ****
   Dots and dashes
   Electricity and its uses
   History of early distance communication
   The telegraph wire **

Another useful rating method involves tacking words or notations to entries, such as chapter titles or numbers, section titles, and other classifiers. This approach is useful when translating from hard copy. Notice in the following example that there are no special typography requirements, and that the notations are unambiguous:

Telegraph communications (Chap 5)
   Dots and dashes (App C; table)
   Electricity and its uses (Preface)
   History of early distance communication (Chap 1)
   History of early distance communication (Chap 1; illus)
   The telegraph wire (Chap 5 Sect 1)

Of course, if there's no hardcopy to refer to, "Chapter 5" becomes meaningless. Increasingly, online documentation is created without a hardcopy version in mind, so more appropriate qualifiers need to be chosen. The qualifiers can be placed at the beginnings or ends of locators. Here are two examples, sorted by importance. Notice how the second example accommodates authors with a sense of humor:

Telegraph communications (everything you ever wanted to know about the telegraph)
   Electricity and its uses (introduction)
   History of early distance communication (introduction)
   History of early distance communication (poster-sized timeline, great for classrooms)
   The telegraph wire (equipment and hardware)
   Dots and dashes (table of telegraph codes)

Telegraph communications: Operator, I'd Like to Place a Call
   Electricity introduction: The Power of a Hot Wire
   Communications introduction: From Shouting to Carrier Pigeon
   Communications timeline, poster-size: From Shouting to Carrier Pigeon
   Telegraphy equipment: Hardware
   Dot-dash telegraph codes, table: Beep Beep Beep!
Looking forward Anything that you can do in a book, you can do online. But the online environment is nothing like a book in its structure and traditions. The very definition of "online indexing" is in flux. The theory and practice of online indexing overlap several other fields of study, including design, information architecture, programming, and cognitive science. In fact, authors find that "the index" has evolved into the information structures they develop before the book is written --- a strange time indeed to write an index.

Now is not the time to develop design standards and traditions. Let us instead experiment, with as full an understanding of this new medium as possible:

  • Seek out and study the many indexes and structures that have been published already.
  • Observe how indexes are used and request feedback from users.
  • Remain open to new and intriguing possibilities.

And finally, share what we know.

Copyright 1999 Seth A. Maislin
