home > indexing
What Exactly Is "Online Indexing"?

... And Why Shouldn't I Call It That?

A Back-of-the-Envelope First Draft
Seth A. Maislin

People seem confused between the different kinds of indexing techniques that involve computers. I've written this Web page to help people grasp those differences. Of course, the following is what I think the terms mean, so I'd be happy to receive your comments.

1. Online Indexing

Let's start with the broadest, most ridiculous term first: online indexing. Everybody has a different understanding of what this means, so I am going to make a suggestion right off: Don't use this term.

The problem seems to stem from a misunderstanding and overuse of the word online. Some people think it implies the use of telephone lines (i.e., modems). Others feel it requires the Internet and/or the World Wide Web. Still others are satisfied with the idea that online means "on a computer." So, with everybody thinking something different, it's important to clarify your terminology.

2. Paperless Indexing

I think I made this term up. :-) Basically, if you're indexing without paper, you're indexing "paperlessly." This is still a very general term, though, including indexing Web documents, working with embedded indexing software, or simply writing a index on a floppy disk and then shipping it to someone. But it's pretty clear: if you are using hard copy, then it's not paperless indexing.

3. Computer-Assisted Indexing

If you're using a computer, you're indexing with computer assistance. Not only is this a broad concept, but it's even broader than our original online indexing! But when people use the term online, sometimes they aren't talking about the Internet at all, and so "computer-assisted" seems more precise than "online."

Computer-assisted indexing includes such indexing techniques as embedded indexing and Web indexing. For more information, see Indexing Software below. (The American Society of Indexers also has its own Web page about indexing software.) Arguably, index generation software also falls into this category.

4. Embedded indexing

One specific example of paperless indexing is embedded indexing. This form of indexing uses software to insert indexing information (known as "tags" or "markers") into the documents that are being indexed. The characteristic that defines embedded indexing is that the indexer does not need to be concerned with locators (page numbers, for example) at any time. This is one great advantage to indexers: it diminishes the workload by removing the need to type or verify page numbers. This simplifies and shortens the overall production process, because with embedded indexes, the indexer can complete the job before pagination is finalized -- something that by necessity waits until the very end of the process. Thus many embedded software applications will automatically determine the page numbers before going to a printer.

Another advantage to embedded indexing is reusability. Since the embedded index tags are combined with the text, that text can be reused (for example, in new editions or in related texts), and the index tags will get reapplied automatically. In this way, duplicated information is indexed for only the first time.

The major disadvantage to embedded indexing is that it is relatively complicated and/or time-consuming to edit the index tags. Editing an index involves opening the text files, finding and changing the index tags, and then regenerating the index to include the edited index tags.

If you are publishing in a medium other than paper, however, embedded indexing is mandatory. In fact, even if you are taking a document originally published on paper and converting it for computer use (such as CD-ROM reference books), embedded indexing must be implemented. This is because there are no true page numbers in computer documents. Instead of page number locators, computer documents use anchors, which are used in place of locators in the index. Embedded index tags can be easily converted into these anchors. (Anchors are required for any type of link within the document. For example, anchors are used to link lines in a table of contents to document or chapter titles. Similarly, an in-text cross reference requires an anchor at the reference's destination.)

Although some people might disagree with me, indexing a Web document itself (that is, an HTML document, as opposed to a printable-on-paper document that will later be converted for Web publication) is also embedded indexing. This is because index information is still embedded in the document. Thus Web indexing is really a subset of embedded indexing.

Here's a brief description of the embedded indexing process. First an indexer reviews material to determine what should be included in the final index. Once those decisions have been made, index tags (or markers) are inserted into the document files. These tags contain the index entry text (the keywords), as well as any other identifying information (e.g., that the page number should be italicized, or that the tag is delimiting a range of information.) Then, if the final document will be printed, the software will "invent" a page-number locator for each marker, coallate the page number data, and create a final index that integrates the tag text and the page numbers. If the document is on the Web, then the markers are used as anchors, which allow for hypertext linking.

5. Web Indexing

I think that Web indexing is interesting because the final index is optional.

If you do have an actual index, the index is usually a Web document that contains several ordered links to other Web documents. Search engines, for example, generate online annotated indexes: the user types in certain keywords, the search engine software inspects its database for occurences of these words, and a list of Web addresses with other data is presented to the user. This list is an index. (For example, there is such a list at the bottom of this document.)

Often, however, web indexing (or indexing the Web) involves creating links among documents to aid in navigation. This document, for example, has links within the text that can direct the reader from one paragraph to another in a nonsequential manner. Thus by a general definition, this document is indexed. I think this is faulty reasoning, but this is because people have different ideas of what an index is. If an index is simply a navigational tool, then this page is self-indexed. If an index is a self-contained navigational tool, then all the links should be in the same place, as described in the above paragraph.

In most cases, however, indexing the Web simply involves the creation of hyperlinks within documents that are accessible on the Web.

A point should be made here: What is a Web document? Most people think that any document that can be accessed by a browser is a Web document, but this is not true. It's possible to use a browser to access files on your own computer. In addition, with FTP or Gopher, it's possible to remotely access someone else's files (if you have permission!), but those files are on somebody's computer, not the Web. To my knowledge, there is no better definition for Web documents than, "A Web document is a file that is accessed through a browser with the http protocol." This document, for example, is at https://taxonomist.tripod.com/paperless.html.

6. Indexing Software

Indexing software is another broadly used term. If you can write an index using a computer program, isn't that indexing software? So even a text editor could be considered indexing software, since indexes can be typed from scratch. I prefer to break this term down into two more specific ideas: indexing-dedicated software and indexing-enabled software. Neither of these includes word processors. :-)

7. Indexing-dedicated software

Indexing-dedicated sofware is software with only one purpose: writing an index or creating an index database. This type of software is designed with a data-entry interface for typing keywords and locators. The software than "massages" the input data and generates a final product: a text index (which can be converted into other formats) with a desired layout. Different brands of indexing-dedicated software packages have different features or options, and can run on different platforms; these are the distributors' marketing points.

For creating database indexes, such as those for the MEDLINE and CINAHL indexes, indexing-dedicated software examples include Cuadra Star, ProCite, and other bibliographic software systems. Online database indexing involves creating index terms and typing them into database records. For example, a single article in a journal may have a single record, full of relevant indexing terms. Then the database becomes searchable: by querying for certain keywords, records that contain those words are found, and the articles that own those records are provided to the user. You'll notice that database indexes aren't readable -- and in many cases not even browseable to the end user. In this way (at least), database indexes are similar to online help.

8. Indexing-enabled software

Indexing-enabled software is software that, among its other features, allows the user to generate indexes. This generation process is called embedded indexing. Indexing-enabled software is often better known for its more general purpose: page layout software, book creation software, word processing software, and so on. The ability to index using this software is simply a feature of the application -- and in being a feature, the indexing capability of a particular application will vary greatly. The indexing capabilities of indexing-enabled software will always lag behind those of indexing-dedicated software, although they are improving quickly.

9. Online Help: Context-Independent and Context-Sensitive Help

The term online help has two implementations. The first, which is simply known as "online help" and nothing else, is recognizable because there is usually some sort of manual that can be read in full in some sort of order. There is also often an alphabetized or other ordered list of concepts and words. The second, context-sensitive help, has no suchreadable "book," but instead is comprised of independent bits and pieces of information that become available to the user when appropriate. When discussing "online help," it is crucial to understand which implementation is being used. I strongly recommend that the terms "context-insensitive" (or, if you're a marketing person, "context-independent") and "context-sensitive" be used exclusively. By the way, nowadays only one implementation is used in an application; having both is both redundant and wasteful, and writing both takes twice as long (the language used for each implementation is different).

Context-independent online help works under the same principles as indexing, but rarely is the help material written by an indexer. Context-independent help utilizes help "screens." These screens, presented to the user like a book, contain several "pages" of information, available in some order. You can find information on what you are looking for by browsing a topic index -- a list of important concepts, ideas, and titles -- or by following a subject or task hierarchy. Content-independent help usually is accessible by selecting a Help (or "?") button or pulling down a Help menu. How the material is organized is up to the authors and/or designers.

Context-sensitive help is much more complicated but in some ways better. With this kind of help, the users access only the most relevant help data. In this way, a user can access the desired information without having to browse or search an index or other Help document; instead, the information is always at the users fingertips. For example, if there were a button labeled "Cancel" on the screen, the user could learn what the Cancel button does. A small window might appear on the screen, overlaying the display, with the information on this button. However, when there is no Cancel button available, the help information about that button also is unavailable (in general), and unnecessary. Context-sensistive help is written to consciously control the access points to the help data (i.e., the index) by offering only the most directly relevant information. This may seem limiting to those who are unfamiliar with context-sensitive help, but good writing and effective forethought can help to prevent having users lost among multiple unhelpful screens of text and confused by irrelevant information. In addition, note that context-sensitive help is usually an online implementation of a full manual or book.

10. Computer-Generated Indexes, or Index Generation Software

This is a whole other indexing field, and it has almost nothing to do with anything else in this document. Computer-generated indexes are indexes that do not involve the human decision-making process (or at least, not until the very end, when the index is reviewed). This is because the software creates the index based on certain algorithms designed by programmers. The software inspects the text, make certain determinations regarding the importance of text elements (based on placement within sentences, format, the number of times the term is repeated, and so on), and writes the index from scratch. The subsequent list is known as a concordance and should not be confused with an index. The single advantage that a concordance has over an index is the speed with which one can be created. Where an index might take a few weeks, a concordance can be generated in minutes.

But -- and this is a big one -- although index-generation technology is improving, concordances are astoundingly inferior to human indexing. In fact, using index generation software is considered downright bad practice for almost all projects, and many indexers shun the thought of computer-generated indexes to begin with. This is because computers have yet to demonstrate an understanding of language, a prerequisite for good indexing. Although concordances can identify words and terminology, they fail to understand concepts. They also are unable to distinguish between levels of importance, and thus tend to index trivial mentions as valuable and fail to differentiate between general ideas and specific examples.


Computer-generated indexes
Context-independent help
Context-sensitive help
Database indexes
Embedded indexing
Index Generation Software
Indexing-dedicated software
Indexing-enabled software
Indexing software
Online indexing
Online help
Paperless indexing
Web documents
Web indexing

Copyright 1999 Seth A. Maislin


Site design by little graphics studio.
© 2002   All rights reserved.