Seth Maislin: Frequently Asked Questions on Indexing and Information Architecture

Frequently Asked Questions on
Indexing and Information Architecture

Updated for February 2006

My website is frequently discovered, usually by people looking for indexers and information architects. Sometimes people find me because they hear about my courses and presentations, and they write me because they want more information. I believe in writing back to everyone who contacts me, but often I get the same questions. So I've developed this page as a way of publishing the answers to the most common questions. This is a growing document, and I hope that by sharing these answers I can build a greater understanding of what I do -- and what you can do -- and to build a stronger information community.

Indexing:
About Indexers
USDA Indexing Course
Indexing Software
Single-Source Indexing Concerns
Not Using Page Ranges
Exchanging Locators with Subentries
Online Cross-Referencing
Acronym Indexing
Indexing the Glossaries
Speed of Indexing
Index Translation and Localization
The Money Indexers Make

Other:
Information Architecture Career Advice
Finding a Taxonomist
Using Hard Copy
Search Engine Glossary
Recipe for Noodle Kugel

If you have a question that isn't here, please write me at seth@maislin.com, and I'll respond to you directly. I'll probably update this page, too. :-)

By the way, the newest FAQ entries are at the top of this page. So the entry right below this paragraph is the most recent. Hopefully you'll visit my FAQ more than once, just to check out the newest additions.

NOTE: If you are looking for help regarding the indexing capabilities of Microsoft Word -- on some days I really want to put that word "capabilities" in quotes! -- you can visit my Word Indexing FAQ.

The Money Indexers Make
How much money can an indexer actually make?

Your income as an indexer depends on two details, really. First, are you just indexing, or are you doing other things, like training and consulting? As a trainer and consultant you often charge more money per hour; instead of doing the work, you're helping or teaching people to do the work themselves. But if you're doing the indexing work, then the second detail is how many hours you have in your working day. If you can work at a decent hourly wage for many, many hours, you can make more money. It's that simple.

So, that said, an indexer who works 30 hours per week (on average) and can index 10 pages per hour at, say, $3 per page should be making $30*10*3=$900 per week, or around $45K per year before taxes and expenses. This makes sense; that number was the median annual income for respondents to a 2000 survey by the American Society of Indexers. For many projects, however, $3 per page is low. And for some lifestyles, 30 hours a week is too many.

The question is, can you make more than that? Perhaps you can earn more for each page by working in more challenging industries (difficult content, difficult deadlines) or on higher-demand products. Perhaps you can work more than 30 hours per week. If you're new at this, you have no choice but to experiment for a while. I guarantee that there are more products that require indexing than there are good indexers who can do the work. That is, the demand for indexing is high. Finding the work might be tough, though, because the people who need indexers are not always looking in the right places. I don't have a satisfying explanation for that one....

Jobs that pay the most money have these characteristics: challenging subjects and audiences, high-demand industries and products, unusual or complex tools and media requirements, and tight deadlines. Thus a system of custom indexes for personalized high-end technical computer documentation produced in XML and delivered online and localized to an international audience simultaneous with high-publicity product launches..., well, that would probably earn you a LOT of money. (We're assuming the people running that project appreciate indexing for what it truly is, of course.) If you want to earn much less money, look to write indexes for starving Ph.D. students writing their esoteric scholarly theses, where a single printed copy will forever remain covered in dust on an academic library bookshelf. :-)

Indexing the Glossaries
Should I index my glossaries?

When it comes to indexing glossary entries, I have mixed feelings.

On some days, I think the glossary should be left alone, unindexed. I don't believe that people looking up terms in an index are looking for definitions; they're looking for advice, help, procedures, context, and so on. Clearly many glossary entries don't include this kind of information, or even a cross-reference into the text where the topic is covered in greater detail. Also, glossaries are standalone elements in books, like the TOC [table of contents] and indexes themselves, and thus they serve a functional purpose quite separate from the indexed text. In an online environment, prominent links to the glossary and individual glossary entries can be made available, too.

On other days, I realize that I can't perfectly predict all of my readers' needs. They might want definitions. Further, most glossary entries do contain (at the least) enough context terms that users seeking advice can use the glossary definition as the first step toward locating a better index entry. Also, glossaries aren't always found, especially because few books contain them. For printed materials we're used to the TOC being at the beginning (except in popular magazines, which drives me crazy), and the index being at the end (except in some popular books, which are stacked with advertisements). But it's doubtful we would thumb into a printed glossary by accident. Creating index entries into the glossary informs that the user that a glossary exists, should they want to use it later. This isn't the purpose of an index, but it's not a bad side-effect.

Finally, I tend not to like the way glossary entries appear in the text if you identify them. If you use "defined" or "definition of" as a subentry for every definition, a well-populated glossary creates a glut of repetitive lines throughout the index:

    entry1
        defined, 301
    entry2
        defined, 301
    entry3
        defined, 302
    entry4
        defined, 302

If you instead use boldface for the page numbers (a practice I dislike rather strongly), you create a visual path of darked, ordered numbers that the eye can't escape. Look at the above example and see how quickly your eye finds either (a) the page numbers, or (b) the numbers in my entry text. Our eyes are good at seeing patterns. Glossaries are in alphabetical order, and so are indexes; that's the source of the trouble.

The solution (and in summary), if you're going to include the glossary, don't identify the page numbers as belonging to it -- no subentries, no boldface. (Make a few specific exceptions if needed.) This "soft sell" approach allows people to scan and use the entries in page number order, getting the definition-only glossary last.

One note: Sometimes glossaries contain a lot of information beyond your standard definitions, as when definitions run several paragraphs. When this happens, you should index the glossary as if it were any other chapter in the book, and write entries for within-paragraph/within-definition concepts.

Indexing Software
What indexing software should I use?

This is a very common question, which can be answered only with an opinion. However, I'm not in the business of giving opinions on third-party products. So I'm going to have to answer this question a bit obliquely. To get the most out of my response, I recommend you check out the web page on indexing software at by the American Society of Indexers. Go to www.asindexing.org and you'll find the page under Resources.

First of all, if you are looking for dedicated indexing software, you're limited to only a few choices. As of this writing, there are few distinctions when it comes to functionality, and many when it comes to user interface. The different programs feel differently to different people. There may be a correlation between software and indexed subject matter, but that would be based on stereotype. For example, those who index technical documents might prefer a database-looking interface, whereas those who index scholarly works might prefer a more free-flowing interface. Your choice, then, should be based on your personal tastes. Most of the software applications are available as free demonstration copies or at lower, student rates.

If you are writing embedded indexes (for definitions of these terms, you can read my article What Exactly Is Online Indexing?" to supplement what you read at the ASI site), then you are obligated to use whatever software is being used to produce the documentation itself. For example, if the documents are written in Microsoft Word, if you're going to insert index data into the documents then you'll need a copy of Microsoft Word. It's that simple. However, the producers of the dedicated indexing software are working on building compatibility between their software and the embedded worlds. This is a great thing, and something to look out for.

Finally, my only real recommendation is that you buy the software only when you need it, and not before. There is an advantage to buying early -- you get a chance to practice! -- but jumping the gun on this is more likely to leave you with software you don't need or don't use. The software pays for itself after the first job, so you might as well wait for that job. Of course, leave time to learn the software, too: basic functionality can be learned in about 2 hours.

Online Cross-Referencing
What's the best way to handle cross references in online indexes?

In my opinion, see references can be entirely dropped from online indexes and replaced with multiple posts. That is, instead of putting all entries under "higher education" and creating the cross reference "universities. See higher education," I say you should put everything under everything. To make this work, however, you'll often need to rewrite your labels. Where "universities" was fine for the cross reference, the new entry should be "universities and colleges," or "universities and other higher education." And so on. My reasoning is that online text is cheap (although screen real estate often isn't), and that there's little point in bouncing a person from one place to another. See references, in print, are useful for only two reasons: to save space (not relevant online) and to control vocabulary (which can be accomplished with longer labels). Consequently, see references aren't needed.

See-also references, in my opinion, also aren't needed. At least, not in the index. In a printed book, people can keep their finger inside the index after they look up a page number. After they read the content on that referenced page, they use their finger to return to the index to retrieve the next page number. A see-also reference works similarly as these page numbers, giving the reader yet another place to look. (It can also serve as an alternative to the page numbers that are there, thus convincing the reader not to use any of the page numbers and instead look up a different term. I'll get to this in a second.)

In the online environment, there is no "finger" equivalent. Readers might use the back button, but that's frequently inefficient and inaccurate. Also, it seems like a lot more work to backtrack (because looking up multiple page numbers is not about backtracking). Instead, I strongly recommend that you put the additional page numbers and see-also references on the referenced page itself, as related topics. Here's an example. Instead of creating the printed entry "colleges, 35, 45, 55. See also universities," you'd replace the page numbers with section titles (or some other [cheap!] text) and lose the cross reference entirely. Let's assume the titles are A, B, and C. In the index, these letters appear as links. Now if the reader clicks on A, they should go to a section titled A, but somewhere within that section (in a sidebar or at the bottom) should be a related topics list. It would look something like this:

    Related Topics: B, C, universities

Should the reader then click on B, or if the reader had clicked on B in the first place, the related topics list on the subsequent page would be

    Related Topics: A, C, universities

And of course, if the reader ever clicks on "universities," references to the "colleges" content should be made available.

The only question is this: in the related topics list do you include only the topic title (universities) or all the topics listed under that title (D, E, F)? That's up to you to decide. If you have the time, I often recommend sub-home-page functionality for larger topics. In this case, you might consider developing a table of contents for all university-related content. It depends on the quantity and the overall functionality of the site.

Now, see-also references are designed to guide users to ancillary and related content, but sometimes readers follow them because they are helping them discover the topic they really want but were unable to think up on their own. For example, I may not remember the name for closed geometric figures of straight sides, but if I look up the word "octagon" and discover the cross reference "See also polygons," then the index helped me formulate the vocabulary. For this reason, it might be worthwhile to include see-also references in the index. This process is known as iterative searching -- where you look for something, learn from the results, and perform a second search -- and would be really useful when actually searching using a search engine. So here's another place where see-alsos are useful: in search results. Of course, some Web search engines already have this, and they're called "related terms." Coincidence? :-)

Not Using Page Ranges
We can probably save ourselves time if we didn't create page ranges. Are they really necessary?

Some people think that page ranges help to identify where the more important information is, but this isn't what page ranges are for. They're to identify where the most quantifiable information is. A one-page checklist might be much more valuable than a ten-page anecdote, for example. This is an important distinction.

So, is it worth the time to create page ranges? The answer is yes as long as you think it's worthwhile to communicate to readers where the quantities are. That's the only true advantage, to help the reader discriminate between less and more quantities of related information.

Obviously, if you give a reader only the first page number, failing to specify that the content runs onto later pages, the reader is still likely to read more than one page. They'll read as long as the reading is interesting. Don't think that readers are going to "obey" you when you provide only one page number, and automatically stop when they get the physical bottom of the page. Similarly, just because you provide a range doesn't mean they're not going read every page you've included. If they want to read, they will.

So if for some reason you decide to drop ranges from your index, to save time or resources (or because your embedding tools make ranges a real hardship, or because you're working with HTML documents and page ranges don't even exist!), you're not destroying the usability of the finding or reading processes. Instead, you are hurting the differentiation process. Here's an example I've given in workshops. In an entry like this

    electrical engineering, 15, 45-61, 422

two-thirds of all workshop participants say they'd go first to the 45-61 range; one-third say they'd start on page 15; nobody ever selects page 422 as a starting place. However, when I drop the range from the index and create this entry:

    electrical engineering, 15, 45, 422

the results are dramatic: 100% of the participants turn first to page 15. So you can see how the range allows readers to choose more intelligently. (There are ways to compensate for missing page numbers. See both Exchanging Locators with Subentries in this FAQ, and my article Indexing Online.)

Perhaps, if you are interested in replacing ranges with single page numbers to save time, what would make more sense is to drop ranges that are only 2 or 3 pages in length. (In fact, I always recommend that all page ranges be 3 pages or longer.) If you're really resource-stingy, drop ranges when you know you'll have only one locator for that entry (and won't need differentiation). On the flip side, however, if I were considering the purchase of a book on electrical engineering and saw only the entry "electrical engineering, 45," I wouldn't buy it. A range of "45-61" might change my mind. See? It's about quantity.

About Indexers and Librarians
What can you tell me about the relationship between indexers and librarians, and how can they work together?

I think indexers are specialists, and many of them do not have library science backgrounds. I'm one of them. I think this specialty tends to be an advantage, in that indexers are rarely concerned about "getting things exactly right" in a taxonomy-oriented way, but rather they are concerned in "getting the job done." They work under deadlines in real-world environments, and they tend to business oriented. Also, indexing work is often short-term, since projects can be completed in as short a time as one week (and sometimes less). Indexers also charge by unusual incremements, like pages, and so have a good grasp of information quantities.

Perhaps the biggest difference between indexers and library scientists (in my opinion) is that indexers are better at bridging the gap between the uneducated users and information. I believe that the ideal interface between users and information is a librarian; no question. The ideal system is a comprehensive, flexible, precise taxonomy for the information, and a librarian interface between the user and the information. For example, I can walk into the public library and say "I want a book on the Civil War," and the librarian will help me narrow my choices. Unfortunately, the taxonomy system behind the librarian is completely unapproachable to me. Nobody walks into the public library thinking, "I need a book filed as HK523.4." What does HK mean? How many people know? (In fact, I learned that some librarians feel the Dewey Decimal System isn't even the best choice for shelving books, but few people know there are alternatives, or are that they are in operation today.) Thus the information access and navigational processes are 100% owned by librarians. Thankfully, those librarians are trained.

In most environments, however, librarian-like interfaces are inachievable. You can't bind a librarian into a textbook, you can't artificially create a librarian into a search engine algorithm, and you can't have dancing paperclip librarians pop up in secondary windows every time someone wants to navigate a website. Consequently, you have no choice but to abandon both the information taxonomy and the librarian interface. So then what? ... Ask the indexers.

Indexing books and writing information architecture for websites (two things I do all day long!) require accessibility and navigation structures that work for the more ignorant of users. This is hard work. I find the idea that I'm "dumbing down the website" to be insulting; do you know how hard it is to mimic both a librarian and a taxonomy using only static text? But indexers do this all the time; they are beholden not to the information, but the user. (Library scientists are beholden to both, with [I think] at least an ever-so-slight bias toward the information.)

The American Society of Indexers is a professional association that meets indexers' needs by providing networking, educational, business, and advocacy opportunities and support. There are other indexing associations around the world, from Australia to Britain to China (okay, so I alphabetized them on purpose :-), and ASI tends to me the most capitalist by focusing on business. (Australia appears to focus on technology, and Britain seems to focus on the scholarly. I think the British society, SI [Society of Indexers], is most analogous to how LibSci scholars think.)

The main concern facing indexers? The economy; what else? Many indexers are also afraid of technology, although this is a false worry. It's life. Besides, indexes are written badly around the world, and current technology makes this easy to do, too. In fact -- and this is a different idea -- technology makes it harder to write good indexes; this is a legitimate problem. However, as the Web infiltrates our lives more deeply every day, the awareness of good indexing (or something analogous) is increasing -- so I think everything will even out in the long run. It's like complaining that the English language is evolving: more words vs. worse grammar.

Library science affects indexing because the LibSci folks have their fingers on what *really* happens in the information worlds. Although the majority of the world is focused on deadlines and the good-enough, to truly understand how to do things right (or at least better) you need to understand the tenets of good library science. I think it's important for library scientists to connect the immediate needs of deadline-driven money-obsessed world with the tried-and-true practices of good information management. Complaining will get you nowhere; education is key.

Information Architecture Career Advice
Do you have any advice for someone investigating information architecture as a career?

I'm not sure if there are easy answers; information architecture is one of those professions where the companies looking for official IA jobs are looking for people with tons of specific experience, but IAs would be of the most value at places where no one has heard of (or understands) information architecture in the first place. :-(

There are sites that collect IA references. For example, go to http://dmoz.org and search for the category called "Information Architecture." (I manage that list as a volunteer under the pseudonym "taxonomist.") There are several sites there, and they would be good references. I don't have any particular reference that I like over another. You might also look for usability sites, although that's a significantly less specific source for IA-style knowledge.

I call myself a consultant, explain to my clients how my capabilities and resources have expanded, and then let networking to the rest. Others have told me that IA is a declarative field: declare yourself an IA, and you're an IA. :-) Truthfully, I've never officially applied for an IA job. In fact, my work at Lycos.com as an Integration Manager started because they needed someone to build an online hierarchy for a product [that never came to fruition]. They hired me because they realized how my skills could be used. They already had a taxonomist on staff, but my approach was more practical and less theoretical (in their opinion, and I tended to agree), and as a consequence I went further in the company. (Of course, Lycos.com is your typically bottom-line-obsessed on-the-stock-market dotcom company; other companies aren't so crude in their approach to good LibSci objectives.)

If you decide that you want to go further in education (keeping in mind that I have no real training myself, although I actually taught IA at Bentley College), I would suggest human factors. This appears to be the bridge between IA and the newest buzzword professions like "experience design" and the more widely accepted professions like interface design. I tend to specialize on the content/text/information side and avoid the programming and graphics perspectives; however, if you want to remain more of a generalist you might look into both of those, too. Again, I know people who choose to specialize in whatever they are most interested in and have the most training in, whether it's in accessibility design, internationalization, graphics, or marketing.

If you want to go the employment agency route, I'm wouldn't really know where to begin. Again, perhaps you can search at dmoz.org and see what you find. (And if anyone out there finds something particularly interesting, let me know. :-)

Finding a Taxonomist
I'm looking for a taxonomist with a subject speciality. Can you help?

Taxonomy building is what I do for a living, on a contract basis. As I write this, for example, I'm building a taxonomy for a client responsible for rebuilding college and scholastic websites. I've also built taxonomies for Lycos.com products, two of which I present as case studies at various conferences.

Subject specialty is where things get interesting, because let's be honest: Subject expertise is really important. Suppose you are looking for a pharmaceutical specialist. (My background is technical, including engineering and computers, and except for an occasional need for acetominophen, my background doesn't overlap with pharmaceuticals. :-) You might search for a medical indexer; there's a special interest group of the American Society of Indexers that is comprised of indexers interested in medical indexing, though not all members are experienced indexers and I don't know any who are experienced taxonomists. Their website is at http://www.scimedindexers.org. Naturally, there are other SIGs with other specialties, ranging from law to environmental science to cookbooks. See the ASI site for more details.

If your project were owned by Northern Light, for example, I'd expect you to demand for a pharmaceutical person with a library science background. They're rare, but that's the point: NL's taxonomies are their bread-and-butter. They build linguistic website analyses for a living. Almost all other companies have business objectives beyond the taxonomy, however, and are unlikely to be as demanding (or as willing to keep searching for a single perfect person). Instead, we see trends toward specialized, multidisciplinary teams. If clients are willing to consider two-person teams, bringing taxonomists and subject experts together should produce the best products.

I'm afraid it sounds selfish when I say this, but I think taxonomy skills are more important than the subject matter experience, provided the client is prepared to provide the latter and is willing to work with the taxonomist as needed. I expect librarians to agree to me, but subject matter experts (SMEs) to disagree. Such is our natural bias toward valuing our skills. :-) Either way, skimping on taxonomy skills will be troublesome, no matter how strong your SMEs are.

USDA Indexing Course
I've heard about the indexing course offered through the USDA (United States Department of Agriculture). Can you tell me what you think about it?

The USDA course is a correspondence course that, depending on your own speed and willingness to pursue the course, can take as long as one year. (It doesn't have to, but keep in mind that completion time is limited by the speed with which the trainers, who are full-time indexers beyond the teaching/grading responsibilities, can reply to your homework.) The course is good, and I've heard only good things about the training. In fact, I know several of the trainers, and they are both great indexers and respectable instructors.

Contrary to belief, is it not necessary to complete the course before marketing yourself as an indexer. If the course really does take a year to complete, you don't have to wait that long to get your business going. Practice makes perfect; you will not become an excellent indexer after only a few months, or a single course. And there are several indexers who have never taken the course (including me). But to be an indexer you really need two things: (a) some kind formal indexing training, and (b) lots of practice. By taking a year-long class, you give yourself an opportunity to work over time to develop skills.

if you already have skills and experience related to indexing -- for example, if you're an author or are coming from from the publishing industry or a documentation team -- then the USDA course is not for you. It takes too long, and you already know so much. Instead, enroll in a short class to learn the indexing basics (see my schedule page for some of my offerings, to give you ideas of what's available) or attend a conference with indexing content and training. The American Society of Indexers has an updated list of various courses and events.

Single-Source Indexing Concerns
We are "single-sourcing," and we do not want to use conditional text. We want index entries that appear in the print book without product acronyms but appear online with the acronyms included to help users identify product associations. For example, a book may have a single entry for "installing," but a compiled online index might need to differentiate between five "installing" entries. How can an index that is using be created one time that works for both?

There are hard and easy answers. The easiest answer is with the page numbers. If you can invent a method for identifying which page numbers go with which product, then you don't have duplicates so much as you have redundancy. In other words, instead of having something that looks like this:

    installing, 50, 50, 50, 50, 50

you have something like this:

    installing
        book a: 50
        book b: 50
        book c: 50
        book d: 50
        book e: 50

This is the process that O'Reilly & Associates tried when they repackaged five printed books into a CD-ROM. However, they also spent time and money on editing the master index so that it looked good. Fortunately, by identifying the five sources with the page numbers, there was significantly less editing necessary.

The most complicated solution is to go back and rewrite all your entries so that they work together. Master indexing is exactly this process. It's difficult to write indexes that work as individual indexes and as an ingredient to master indexes, but it's possible, especially with foreknowledge. If you know in advance this is something you need to do, you can develop a standardized vocabulary (or at least some indexing guidelines) that minimize your troubles. When you do this without any advanced preparation, yup, you've got trouble.

The biggest challenge with the editing process isn't making the index look good, of course; it's in the interpretation. When you have two entries that are identical, there's no guarantee that they are of the same scope. For example, consider an index that has this entry:

    electrical engineering, 35

and a second index with this entry:

    electrical engineering, 40-60
        employment opportunities, 45-47
        schools for, 59
        See also engineering

To combine these without inspection might create content anomalies; page 35 from the first book might be about engineering schools, for example. There's no good solution for this, and it's rarely worth the effort at getting it right.

I've done this kind of work as a contractor, and although it can be really tedious it's rarely difficult. If you have someone dedicated to the process you can clean up the final index pretty quickly, although you have to do so manually based on a generated product. It might be possible to build a system to speed up this process in the future. Also, if you know you're going to do this often, consider contracting someone like me like help you build a streamlined process.

Using Hard Copy
Is book indexing still accomplished with hard copy?

As of late 2002, indexes for most bookstore books are written using hard copy, although today's indexers use software to collate index data into an indented, alphabetized list. Technical documentation, such as the manuals that are inserted into software packages, are usually indexed with minimal hard copy. If you are looking to be an indexer but are also interested in minimizing your computer use (for example, perhaps your have vision or other accessibility concerns), you are going to have the best luck targeting your indexing career toward clients that don't require higher-level computer documentation building. Clients like these will include the publishers of coffee-table nonfiction, trade nonfiction, and university presses. Once you start getting into the busy and more lucrative world of technical documentation and online publication, including textbook CD-ROMs and e-books, your need to use computers increases substantially.

For specific information about indexing I recommend visting the website for the American Society of Indexers. I'd also recommend taking a course of some kind. Many courses are listed at the ASI website; my course are on my schedule page.

Exchanging Locators with Subentries
I've heard that main entries should have either locators (page numbers) or subentries, but not both. Do you agree?

I absolutely do not agree. More than that, I don't believe that there are "rules" to indexing, only guidelines. So when you hear absolutely statements like this, critically ask yourself what logic lies behind it.

Whenever you decide to provide subentries instead of just listing the locators at the main entry level, you are essentially expanding your labels. Think of your locators as a kind of label; text is another. In a 400-page book, a page number label of "5" implies introductory content, and a range label like "35-55" communicates that there's a multipage quantity of information. These locator clues can help you choose which ones readers want to turn to first. Of course, numbers aren't nearly as useful in making distinctions as text, so you might want to add text. Or not.

Consider an entry like "printers, 15, 35, 55, 75," which has four semi-indistinguishable locators. You could write four subentries, one for each page number, but this demands more effort and time from you. So ask yourself if a longer label would be useful enough to justify the effort; if yes, then go ahead. However, if the subentry text doesn't provide value to the reader at all, you're wasting everyone's time:

    printers
        general information, 55
        introduction to, 15
        kinds of, 75
        useful tips, 35

That's a lot of text without a whole lot of information. You're better off keeping only the "useful tips" subentry -- because that's potentially the most valuable part for users, worth the extra words -- and tucking the others back into the main entry level:

    printers, 15, 55, 75
        useful tips, 35

And voila, we have both locators and subentries for the main entry. You can see how this might be valuable. In fact, this structure violates yet another one of these "rules," which is that a main entry should never have only one subentry. This is an orphans rule, and I don't agree it with either.

Finally, I want to clarify that I'm speaking here only about locators and subentries. See-also cross references, on the other hand, should by definition always be accompanied by something, either locators or subentries. For example, "colleges, 16-26. See also universities" is correct.

Acronym Indexing
In your opinion, how should acronyms be indexed?

Acronyms should be indexed just as you would index synonyms of any kind. For example, if you wanted to index "directories" and "folders," there are essentially three possibilities.

(*) Cross Reference
        directories. See folders
        folders, 001, 002, 003
 
(*) Double Posting
        directories, 001, 002, 003
        folders, 001, 002, 003
 
(*) Language Clarification
        directories (folders), 001, 002, 003
        folders (directories), 001, 002, 003

Note that the cross reference can be applied in either direction, and also that language clarification is sometimes used to supplement a cross reference. This last version tends to be the common for acronyms, but it's only one choice of several.

        directories. See folders
        folders (directories), 001, 002, 003
 
        File Transfer Protocol. See FTP
        FTP (File Transfer Protocol), 001, 002, 003

If you double-post, it might be useful to perform language clarification on both entries, but keep in mind that some acronyms are so long (and the spelled-out version, while mentioned in the book, is likely unnecessary to specify in an index) that it's not always a good idea.

        CORBA (Common Object Request Broker Architecture), 001

When you use a cross reference, the challenge is in knowing the direction of the reference. You should point to the more common term. Thus, which term is more knowable in this example, "FTP" or "File Transfer Protocol"? ("Personal computer" or "PC"?) I find that acronyms in computer environments are more common; for instance, beginners don't know what the X in "XML" stands for, they don't care what DTD and SNMP stand for, and they wouldn't parse CORBA if their lives depended on it. :-)

There is one odd difference between working with acronyms and working with synonyms. With acronyms, it's always possible that the acronym and spelled-out versions appear literally next to each other in an alphabetized index:

 
        World Wide Web (WWW), 001, 002, 003
        WWW (World Wide Web), 001, 002, 003

In this case, you can get away with using only one, and not the other.

Search Glossary
Can you explain boolean logic, stemming, and other search engine features?

The best place for information on search engine functionality is the Search Engine Glossary available at Search Engine Watch. The URL for the glossary is at http://searchenginewatch.com/searchday/01/sd1120-searchterms.html.

Speed of Indexing
If you were to write an index for a 60-page technical manual (for example), how long would it take, and how much might it cost?

Standard indexing speeds are that a professional indexer (with an acceptable level of knowledge of the subject matter and some indexing experience and training) requires one hour for ever five to ten pages of material, total. Thus a 60-page manual would require 6 to 12 hours of indexing time; this time includes everything from browsing the material to determine what's indexable to formatting the final index pages. If you're writing the manual, indexing time is a little bit shorter because you're already familiarizing yourself with the material. ... The more difficult the material is to understand, or the more densely informational the documentation is, you add extra time.

For me, if I knew I were writing an index for a 60-page manual, I could promise to finish the job in one workday without flinching.

For cost, it again depends on a few factors, primarily indexing experience. Consider the reasonable base rate of $4.25 per page for an experienced indexer (so the base project fee would be about $250 for this project), and then adjust based on the density of the material. For example, I'd expect the index to cost less if there are lots of illustrations or computer code. Other factors include special tools requirements, deadline constraints, team meetings, external factors in the industry or company, and other unusual requirements. Additionally, I choose to consider my freedom in getting the work done, both in timing and in work location, and would likely charge more for rush and in-house work.

Finally, the biggest factor is subjective: the long-term indexer-to-client relationship.

Index Translation and Localization
Are there any special internationalization and localization issues for indexing?

It's commonly recommended that indexes not be translated between languages. There are too many differences between languages and cultures to make translation a perfect scenario. In fact, as an extreme case it should be noted that traditional French textbooks don't have indexes; instead they rely on complicated tables of contents. (The best explanation I've heard is that this decision is an offshoot of France's desire to maintain cultural identity, and consequently control access.) When you're not translating languages within similar alphabets -- as when English is translated to Japanese -- the very organization structure changes. And finally, in addition to language-related differences, consider also that the audiences might be different enough to affect your decisions over what belongs in the index in the first place.

If you are simply writing keywords, however, translation is more likely. Without the relationships, alphabetization, and word-splitting occasions of a whole index, words might be more easily translated and functionally searched against. Of course, even then, word-to-word translations have trouble.

There's an interesting line, however, between writing indexes across multiple languages and writing indexes for multiple audiences. For example, it's possible to build "index filters" that will convert a global index into smaller, custom indexes. (For example, consider how the index of the teacher's edition of a textbook might be edited down to a decent student-edition index. Similarly, consider how websites might enable additional navigational possibilities after the user logs in.) If you are working with cultural differences that can be strictly defined, it might be possible to build a single multi-language index, and then "filter it down" for specific languages. But this would be a huge project, and the overhead costs would be worthwhile only if you are working with such a large and integrated set of documentation that you'd consider facing the content management nightmare head on.

The easiest and best solution, although it might sound expensive, is to contract indexers who work in each language and are comfortable working with the translations, no matter how automated.

Recipe for Noodle Kugel

Did you jump to this FAQ question first? :-) I wonder what my mom would think if she discovered this was here.

Ingredients
1 lb. egg noodles
3 beaten eggs
1 c. sugar
1/2 c. melted butter
1 1/4 t. vanilla
1 1/2 t. cinnamon
16 oz. container of sour cream
16 oz. container of cottage cheese
optional: 1 c. light raisins

Boil noodles and drain well. Combine everything. Grease a 9 X 13 pan. Bake at 350 degrees Fahrenheit for 45 to 60 minutes, until edges begin to brown.

HOME | ABOUT | INDEXING | WEBSMARTS | FUN & WACKY | EMAIL
Site design by little graphics studio.
© 2002 All rights reserved.