Seth Maislin delivered a presentation on "Why Indexing Fails: Real-Life Problems and Solutions" at the March SDP meeting. Seth is a Directory Manager at Lycos, Inc., and sole proprietor of Focus Information Services. He provides indexing, information architecture, and consulting services to public and private audiences.
According to Seth, there are two approaches to indexing: information architecture/design or knowledge management.
The first is more of an engineering approach. That is, this approach is appropriate when you have a good knowledge
of the subject matter and the topics that need to be covered. Another way of looking at this approach is using the
top-level design approach common in software engineering. You flesh out the details of a subject working from the
highest level to lower levels. In other words, you might have a complete outline of the subject before you start
writing about it so you can plan your index entries before you start to write about the subject. The second is a
"touchy-feely" approach. Another way of looking at this is that you start writing about a subject using the
information you know about the subject from previous projects. As you learn more about the subject, you revise
your outline and add more information. This can be viewed as a bottom-up approach. In this approach, you will
probably add index entries as you write. You will probably not be able to anticipate most of your index entries
before you start writing. Whichever approach you use, you still need to structure information so users can find it.
Indexing: Past and Present
The way we use the written word in our information sets or books today has changed how we present information.
Indexing used to be done by librarians. Today, programmers, writers, or engineers commonly do indexing. The same
skills are required to create indexes but the problems are different.
Seth said the following trends make it more difficult to produce good indexes: custom documentation,
single-source production, time-variable content, multibook CD-ROMs, internationalization, and tools designed
for embedded indexing.
Historically, creating an index for a printed book involved writing the book first and then indexing it.
You wrote index entries on index cards, alphabetized the index cards, and then collated them, placing similar
entries in sub-entries. To enter the index, you picked up an index card and typed the entry. You repeated this
process until you were done.
With today's tools, you can have embedded indexes where the index doesn't have page numbers. You can also use
tags to generate an index as you build the book. Writing the index is part of writing the book but sometimes
it becomes more segmented as pieces of the book go to the printer.
You can also write the index before the book based on the organization or information structure. You can group
similar topics or use a linear outline. As the structuring occurs, the indexer can see general themes.
It is possible to design the structure before you write the book but problems can occur with items found after
you have selected a structure. In traditional books, the table of contents is linear and does not repeat entries.
In contrast, the index can repeat entries and is non-linear. Also in a printed book, the table of contents occurs
in the front of the book and the index in the back. But these are superficial structures that you lose when you
put a book online.
Multibook CD-ROMs and Single Sourcing
Putting multiple books on a CD-ROM is common today for large documentation sets. For example, you might put
five books on Java on a CD-ROM as a "virtual bookshelf". Using this approach can present problems in indexing
such as the consistent formatting of index entries and including enough entries to make the bookshelf useful
to customers. These problems can be compounded if there are multiple writers on the books. How do you maintain
consistency in the indexes for all the books? Often, you write your indexes before the book is done, which
can lead to inconsistencies in formatting of index entries. Some methods used in writing indexes include
using a controlled vocabulary and using style sheets (specifying the use of plural vs. singular, gerunds,
etc.). In controlled vocabularies, you use the same noun repeatedly and you use synonyms for cross-references.
For example, a user might look for "protection" under "castle" or "shield". Using a controlled vocabulary
provides the big picture but you lose details. This can result in poor searches by search engines.
Using a single source for documents in different output media usually doesn't work. For example, it is difficult
to use the same source of information for both a printed document and online help. The mediums and how people use
them are too different. Books tend to be linear in nature. That is, the information is structured so that people
will read from the beginning to the end of the book. In online help systems, users tend to be looking for
assistance with a particular task and aren't interested in reading about a tool from start to finish. Although,
you can look for information in printed books randomly, typically, users use printed books to learn concepts
and how to use the implementation of those concepts in a particular tool. In online help systems, people aren't
interested in learning the concepts about a particular task. They only want to know how to perform a particular
task as a part of getting their job done. A particular task might only be a subset of what they need to do.
They view software products as tools to help them accomplish the assigned tasks by their employers. They are
less interested in why you perform a particular task, just how do you do that task and then they move on the
next task required for their job.
Seth said that on the average, you should have 5 index pages per 100 pages of text. This can vary depending
on the density of the text. If your document is very repetitive or has many graphics, you might have a
shorter index than the average.
Text aimed at beginners might have fewer index entries than highly technical material. Reference information
usually has more index entries than introductory or tutorial information. A quick reference document will
usually have a long index with short entries. With online documentation, you might want to use double-posting,
where for example, teaching is the main entry with education as the subentry and education is the main entry
with teaching as the subentry.
Testing Your Index
Does the index work for you? Do a sampling of the index. Are the entries correct and useful?
Read a chapter, put the book aside for a week, then go back to the index and try to find information on key topics.
Another way to test an index is to simply pick a topic. Can you easily find it in the index?
Seth said that searches tend to be literal while indexes help when you know a related name or topic.
Intelligent searches try to mimic the way people look for information by assigning "weight values" to words
based on where the words appear. For example, words in a title are more important than words found in paragraphs.
Another problem arises as your documentation set grows in size. How do you assign "weight values" to topics
found across multiple books? One tip is to search iteratively, that is provide the capability to search the
search results. Another way is only allow searches for large topics or restrict a search to a specific book.
A good Web search engine is
Seth offered the following techniques for creating an index:
Consider indexing in chunks. Start indexing those topics that you know your book will cover.
Other topics will arise as you write the book so you can index them later.
Identify topics that won't change and use these to build pieces of your index.