home > indexing
Troubleshooting Those Horrible Microsoft Word Index Problems
.
.

By Seth A. Maislin
Updated (twice!) in February 2007

  Indexing with Microsoft Word (any version) isn't easy or effective. There are a number of problems that an problems an indexer might encounter when indexing with Word's XE fields. Over the years, I've received a large number questions from writers and indexers looking for helpful solutions to seemingly unsolvable situations. This article is a list of those questions, and my attempts at answering them.

 Jump to:
My Page Numbers Are All Wrong!
Indentical Entries Are Not Combining Properly
Individual Entries Are Not Sorting Where You They Should
   and How to Override Sorting
Index Entries Aren't Showing Up in the Index
My Page Ranges Aren't Working
My Cross References Aren't Positioned or Formatted Correctly My Index Suddenly Vanished My Index Entries Suddenly Vanished My Continued Lines Vanished Can I Delete All My ___ Entries? Auto-Marking: What Good Is It? Teaching Auto-Mark to Ignore or Consider Certain Things Can I Create an Index With Something Other Than Page Numbers?


  Can't Find Your Problem (or Answer)?
This is a preliminary draft; there is always something going wrong with Word's indexing features. But I want to be helpful, and so I pushed these first few answers to the Web right away. If you have a question you don't see here, please contact me at seth@maislin.com. I might have your answer.

My Page Numbers Are All Wrong!

  If almost all of your page numbers are coming out wrong when you generate your index, then you've stumbled across one of the stupidest side-effects of Word's indexing functionality. It turns out that you have to hide your index to make it generate accurately. Can we say "counter-intuitive"?

  Basically, you can't let the {XE} fields that you use to build the index mess up your final pagination. The way to avoid this is to generate your index only when all hidden text (paragraph markers, XE fields, etc.) are hidden. Then your pagination will match. For example, if you have a paragraph that looks like this:
blahblah blah blahblah blah blahblah blahblah blah blahblah 
{XE "data data data data data"}{XE "data data data data
data"}{XE "data data data data data"}blah blahblah blah 
blahblah blahblah blah blahblah blah blahblah blah blahblah 
{XE "data data data data data"}blahblah blah blahblah blah 
blahblah blah blahblah blahblah blah blahblah blah blahblah 
blah blahblah blahblah blah blahblah blah {XE "data data 
data data data"}{XE "data data data data data"}blahblah blah 
blahblah blahblah blah blahblah blah blahblah blah blahblah 
blahblah blah blahblah blah blahblah blah blahblah blahblah 
blah blahblah blah blahblah blah blahblah blahblah blah 
blahblah blah blahblah blah blahblah blahblah blah blahblah 
blah {XE "data data data data data"}blahblah blah blahblah 
blahblah blah blahblah blah blahblah blah blahblah blahblah 
blah 
you'll notice that this paragraph is actually a lot longer than the paragraph would be without the bracketed XE fields removed. So make sure your index is invisible before you generate it.

  For your sake, don't forget how important this is. If you generate an index and then decide to make edits to it, you'll need to make visible your fields again. Then, to see your edits manifest in a newly generated index, you'll have to remember to turn them off. Many indexers get slipped up in this awkward cycle.

  By the way, not all your page numbers will be wrong if you fail to turn off hidden text before generating, especially if your documentation has many pages that aren't completely filled. What causes the inaccurate page numbers is reflow, but if youre document has very few index entries and a lot of leeway for minor reflow, you may not even notice you're creating pagination errors in your index. So seriously, be careful.

  If your page numbers are WILDLY off, however, then you are having some other problem. Check the settings within your {INDEX} field (the index code itself) to make sure you're using the page numbers you want. If you're not careful, a very common mistake that makes good writers look like completely idiots is when multipart page numbers are only half-appearing. For example, if you have nonnumerical "4, 7, 1, 6, 3" when you should be seeing "2.4, 2.7, 3.1, 5.6, 8.3," your multipart numbers aren't rendering properly.

Identical Entries Are Not Combining Properly

  You generate your index, only to discover that two seemingly identical index entries are failing to combine. For example, you're getting
    Washington, George, 33
    Washington, George, 37
instead of
    Washington, George, 33, 37
  The most likely answer is that your spacing is wrong. With Microsoft Word, you do NOT want to include any extra spaces in your entry, especially on either side of delimiters like a colon, semicolon, or quotation mark. In other words, fields like
     {XE "name"}
     {XE "main entry:subentry"}
will sort separate from
    {XE "name "}
    {XE "main entry: subentry"}
because of those extra spaces in there. You need to get into the habit of not including them. (This is particularly hard for writers, who think you must always have a space after the colon.) Spaces can also appear spontaneously if you have any of Word's auto-type features turned on.

  If you're certain it's not a spacing problem, perhaps it's a style problem. It's possible that your seemingly identical entries actually have formattings or styles. If one of the entries were in italics, and the other weren't, it would be obvious. But perhaps you're using paragraph styles that do look the same, such a slightly different versions of the same font, kerning differences, etc. Microsoft Word generally conserves style information when you copy-and-paste text from the documentation into the {XE} fields, something many indexers do. Different style can also appear if index entries are imported from other files, especially those created by other programs.

  In the end, the best (and perhaps only) way to make sure that your entries are identical is to copy-and-paste one of the index entries for all of the others. That is, copy your {XE} field and paste it in the other locations. In fact, this is how I prefer to index my Word-based documentation: I create a single {XE} field with bogus text, like {XE "TEXTHERE"}, and then paste and overwrite for every entry I need. Sometimes this is only way to be certain you're not introducing a hard-to-catch error, such as an entry that has nonbreaking or en spaces instead of regular spaces, underlined underscores, auto-capped uppercase letters, and so on.

  Also, remember that a problem at the main-entry level can cause your identical subentries to combine improperly, and vice versa. You need to make sure that every element of your entry, from the first letter of the main entry to the last letter of the subentry (or sub-subentry) or cross reference is precisely the same, with no extra spaces or other characters. You also need to make sure that the settings for your entries are identical, so be sure to look at your field codes (and the spacing around them) as well. Also, if you are using manual overrides to re-sort your entries (see how to do this below), you need to apply these overrides consistently.

  If you're certain the entries are exactly the same -- that is, you've tried the copy/paste approach I described above -- and you're still having combination problems, then the problem might be in a third entry that you can't see. For example, you might have two of three identical entries, with a third entry looking the same but different. If you continue to troubleshoot the two identical entries, you won't find a problem. Are you sure there isn't a third entry someplace? Remember that Word ignores index entries that produce the same result in the generated index, such as two identical entries on the same same page.

  After testing for everything above, I guarantee your problem is no longer related to the index data themselves. Instead, there's something wrong with the placement of the {XE} fields, perhaps because of some document formatting things going on, or with the document template itself. Try moving your entries around, especially if they don't appear in "normal paragraphs." For example, if your entries are inside a table, try moving them around within the table, such as to other columns or to positions just outside the table itself. If they are in footnotes, take them out of the footnote and put them next to the footnote anchor. And so on. Word is a word processor, not a publishing program; it is not uncommon that your index will get messed up by some of the internal things that Word does. All I can suggest is to play around, doing things that don't seem intuitive but might actually mean something to Word's internal program (like changing columns).

Individual Entries Are Not Sorting Where You They Should
(and how to override what Word wants to do)

  There are many reasons why your entries don't appear in the index where you want them to go, but the biggest and most annoying reason is because Word doesn't actually know how to sort index entries! Word's sorting algorithm, which is used not just behind-the-scenes in indexing but also as a menu-item feature, is quite rudimentary. Basically, it's an adjusted ASCII sort, in which the uppercase and lowercase letters are considered equal, and all nonalphabetic characters appear in ASCII order. (If you've never heard of ASCII, don't worry about it. Just understand that ASCII sorting is only slightly more modern than punch cards.)

  One kind of problem that occurs with Word's sorting is that it considers lowercase and uppercase letters of equal value in the sort. Equal letters are sorted in order of occurrence; that is, the entry "Washington, 35" will sort ahead of "Washington, 37" because 35 appears before 37. Consequently, if I have the initial-lowercase "washington, 36" in my index, the result will look surprisingly painful:
   Washington, 35
   washington, 36
   Washington, 37
Fixing this means paying closer attention to your capitalization. If necessary, you can override the default sorting that Word applies. I'll show you how in a moment.

  Another reason that something might sort in an awkward position is because you have characters or formatting that is getting in the way. For example, the entry {XE "Washington: presidency of"} will sort before {XE "Washington:biography of"} because of that extra little space before the word presidency, after the colon. You need to learn to not type extra spaces in your index entries.

  If your a knowledgeable indexer, you know that subentries should be sorted by first important word. For example, the subentry "of Delaware" should be sorted under D, not o. Word doesn't know this. You must manually instruct word how this subentry should be sorted. Instructions on doing this is below. Also, remember that if you manually override the sorting for a subentry that appears more than once throughout your index, you need to manually override all of them to have them sort together.

  Another result of ASCII-based sorting is that nonalphabetic characters (numbers and symbols) don't necessary sort in the relative order that you want. For example, you might prefer that the & symbol appear before the $ symbol. They might also appear at the top of your index before the A entries, at the bottom of your index after the Z entries, or some combination of both. To make sure that your nonalphabetic entries appear where you want them to appear, you will need to override what Word wants to do. Instructions on how to do that are below.

  Finally, thanks to this old-fashioned ASCII-based algorithm, Word won't care what font you're using. In other words, the letter that appears when you press the A key will always sort as if it were an A, even if the actual character you see on the screen is a Greek alpha. It's up to you to override Word's behavior if you need these characters treated in some special way.

  How to Override Sorting. There is an undocumented feature that allows you to apply manual sorting overrides for your entries. Follow the entry level that you want differently sorted with a semicolon and the sort text itself. In the example below, the main heading will appear re-alphabetized as if the text of the main heading were actually "resortedmain." The subheading is also re-alphabetized as if it were spelled "resortedsub."
     {XE "main;resortedmain:sub;resortedsub:sub-sub;resortedsub-sub")
Note that if you're familiar with the syntax for embedded indexing with Adobe FrameMaker, this is significantly different. With FrameMaker, you would write the complete entry, followed by the semicolon, followed by the complete resorted entry. With Word, however, you apply the resorted text to each level individually. In general, you will want to override the sort for only one level; my exaggerated example above overrides all three levels. And as always, remember that spacing counts, so don't insert spaces that have no meaning.

Index Entries Aren't Showing Up in the Index

  You can see the {XE} field in your documentation, but it's not appearing in your index. If you're lucky, it's an easy fix, as described in the next paragraph. If you're unlucky, you might have some heavy work ahead of you.

  First of all, you want to make sure that you've actually generated your index since that "missing entry" was typed. If you create, edit, or delete your index entries but don't remember to regenerate your index, then your additions, edits, and deletions wouldn't appear in your index. Try generating your index again, and they might just appear. (This is a common oversight with embedded indexing programs like Microsoft Word, especially when there are several authors or other document handlers. In fact, sometimes an index is generated and put into the production system in advance of additional changes to the index. It is of prime importance that the index not be processed until all indexing and index evaluation is complete.)

  The next-easiest problem to look for is a syntax problem. Are you using quotation marks properly? Do you have a space after the XE letters? Did you accidentally insert an index entry into another index entry, as with this:
   {XE "main entry:{XE "inserted entry"}subentry}
There's also the possibility that your index entry includes a special character that isn't properly escaped. For example, if you're trying to create an entry with a colon in it, you need to put a backlash in front of that colon:
   WRONG:   {XE "Luke 9:21, interpretation of"}
   CORRECT: {XE "Luke 9\:21, interpretation of"}
It's also possible that a necessary formatting character was escaped such that it no longer works. This happens when non-Word documents are translated over, since most translation tools are designed to preserve special characters as literals:
   WRONG:   {XE "Washington, George\:cherry tree fable"}
   CORRECT: {XE "Washington, George:cherry tree fable"}
There's also the possibility that your special characters are formatted in such a way that Word doesn't identify them as special characters. For example, if your colon is italicized, Word might not consider it a delimiter. Check for these kinds of syntax errors and generate your index again.

  Of course, it's also possible that your index entry is appearing in the index, but not where you expect it to appear. For example, you might have two identical index entries on the same page; because they're on the same page, only one of them will appear in your index. It's also possible that the entry is sorting in an unusual way because of some extra spaces or characters in your entry, or because you're not completely familiar with how nonalphabetic characters are formatted. Also, if your entry text has an unescaped semicolon (;) in there someplace -- a common and hard-to-see typographical error for someone who intended to type a colon (:)-- then you have overridden Word's default sort for something else. To understand better how sorting works, and how to override it, see the FAQ items "Individual Entries Are Not Sorting Where You They Should" and How to Override Sorting.

  Another kind of "disappearance error" happens when you've used a flag in the {XE} or {INDEX} fields to limit which kinds of entries will appear in your index. The flags most likely to be causing your problems are the XE \f flagopens in new window and the INDEX \f flagopens in new window. You can read about other flags as well in my MS Word Flagsopens in new window document.

  Finally, in the worst case, it's possible that your {XE} field isn't a real {XE} field. It might look like an {XE} field, but Microsoft only cares that it was created properly. For example, if you've been reading this FAQ you've probably seen lots of example fields, like this one: {XE "Washington, George:cherry tree fable"}. But of course this isn't a real field, because it's typed manually into an HTML document. Consequently, if I were to copy this paragraph into a Microsoft Word document (a .doc file), it wouldn't be transformed into an index entry. It would be treated like the text that it is. Textual {XE} fields like this one are not uncommon when files are being transferred or translated between formats and applications. The {XE} field works only in Microsoft Word. Only with careful translation into other applications with embedding features -- for example, Adobe FrameMaker's marker system -- will the index entries maintain their "index entry-ness" and not become simple text.

  If you can see {XE} fields in your documentation that don't appear in your newly generated index, it's possble that you're not looking at an actual index entry. Index entries have to be inserted using Word's index-entry-creation dialog boxes (see your Help system), or copied from other valid entries. Trying to type {XE ... } will not give you what you need.

  So let's suppose you have textual fields that you need to convert to index entries. What you need now is a way of creating those fields quickly, and no such tool really exists. However, at least you can use the search feature to help you out. Start by searching for the uppercase XE to find your entries. (If your textual entries aren't formatted with hidden text, I recommend hiding hidden characters while you do this search to avoid accidentally finding your real index entries.) Now, every time that you find a textual XE field, highlight the index entry text and create a real entry from it, using Word's index entry creation features. (If you're clever, you can create a toolbar button to create an index entry, making this job a bit faster.)

  If you're feeling particular industrious, I recommend building a regular expression that will find complete XE fields in your text, from opening brace to closing brace, and call out the index data as a subexpression. You can then replace everything you find (the whole field) with a uniquely formatted version of just the subexpression (the index text). Choose a format that is not going to appear in your index, like a new named style or a strange color. Now, you can search for the formatted text one item at a time, sped along using the unique format, and create index entries from them. Afterwards, go back and delete your formatted text.

  If those solutions seem a bit obtuse, it's because they are. If you have index entries that aren't properly embedded, you don't really have index entries. The approaches I've loosely described in the last two paragraphs are attempts and shortcutting the "indexing from scratch" you have to do. (By the way, I occasionally offer my services as an automoton when it comes to global index data conversion in Microsoft Word. If you need my help -- that is, if you want me to fix your index or index data -- write me, and we'll talk.)

  After all that, there are always problems that have nothing to do with the index data themselves, but some other kind of problem that's happening at a higher level within the Word processing. For those kinds of problems, you should probably consult a mailing list of technical writers, who are much more familiar with the global or template-level properties of Word documents than I.

My Page Ranges Aren't Working

  it is absolutely no surprise that your page ranges are having problems. Microsoft Word's functionality for page ranges is abysmal. The only good thing I can say about it is that is exists! (There are other applications with embedded indexing functionality that don't allow for the existence of ranges, like the add-in for Quark. But wow, who's writing a big index using Quark?)

  For your ranges to be working, three things have to happen simultaneously. First, your {XE} syntax has to be correct. Make sure you're using quotation marks when appropriate; in fact, it's a good idea to use them all the time by default. (If you don't know the syntax, read about the \r flag at my syntax pageopens in new window.) Second, you have to have a valid bookmark name as the argument to the \r flag. Third, you have to have am existing, functional bookmark with a name that matches your argument.

  Stay away from naming your bookmarks with special characters, or anything else that might interfere with the indexing process. That means you shouldn't be using colons, semicolons, backslashes, or quotation marks -- but you shouldn't be naming things with those characters anywhere, anyway.

  The limitations of your bookmarks are the same limitations in your page ranges. For example, you cannot create a range that goes across multiple documents (i.e., a range that starts on a page in one document but ends on a page in another document). Also, document editing can invalidate your existing bookmarks; it's possible, for example, for you to cut the endpoint of a bookmark and then paste it before the starting point, or into another file. Bookmarks are a really ugly feature of Word, so when indexing, it's important that you learn when they don't work.

  Bookmarks that don't scan across multiple pages may (or may not) appear as ranges in your final index. For example, instead of getting "Washington, George, 101-101," which you wouldn't want anyway, you might see "Washington, George, 101" and think your bookmark is broken. If you're working with very small sections of text, it's my recommendation that you don't create ranges anyway; a range of two pages isn't particularly helpful to the reader. (This is a theoretical point that I would be happy to argue, but not here.) In other words, if you are indexing a passage of just a couple paragraphs at most, don't use a bookmark; the likelihood that those paragraphs will run across two pages don't make up for either the risk of error, let alone the added effort required to build the range in the first place.

  You might have some trouble if you are trying to create page ranges that overlap with other page ranges in your index, or have internal single-page entries. For example, if you're creating an entry like "Washington, George, 101-109, 102-107," there could be trouble. (Actually, Word probably won't have any idea that you're making this mistake -- which is worse!)

  Finally, if you are attempting to carry your ranges from or into other software applications or formats, such as Adobe FrameMaker or Adobe PDF or even simple HTML, I guarantee your ranges will vanish. The bookmark paradigm simply does not exist in any other program to my knowledge. (What other proof of its terrible-ness do you need? Why does Microsoft insist on keeping this feature alive? Have they never heard of HTML anchors?) Consequently, if you are producing documentation in more than one print or print-like format, accept that you'll never be able to work with page ranges without a lot of extra editorial work.

My Cross References Aren't Positioned or Formatted Correctly

  No surprise there! One of the places where Word tends to fall down is with cross references. That's because cross references are treated as ordinary text; they don't actually link to anything. Cross references within the {XE} fields are no different than my typing "Washington, George. See also U.S. presidents" right here. It's just ordinary text. For this reason, all of your formatting (and most of your positioning instruction) will have to be managed editorially, like the rest of your documentation.

  First of all, the italicizing of the words See and See also, as the Chicago Manual recommends (and don't get me started on why I dislike Chicago-set standards, especially within embedded indexing) has to be handled manually. You can't use named styles, which means you must actually highlight the text you want italicized, and then italicize it. If you're using the dialog boxes to create your entries, italics might very well be the default, but remember that the result isn't using styles. Consequently, if you end up doing anything to your document globally in an effort to change your italicized text into something else (e.g., some fonts have special italics versions that look better), too bad. Your index won't cooperate, especially if you're porting the index into another document.

  If you need to create a special kind of cross reference, such as a See also specific, you will need to italicize within the document window; you can't do any text-level formatting within the dialog box.

  The positions of your cross reference are not negotiable. If you use the standard syntax of the \t flag (see XE \t flagopens in new window, your cross reference will always appear after the entry itself. What the \t flag is really doing is allowing you to type in whatever you want as if it were a page number. In fact, this will allow you to create See references next to page numbers, which is a big no-no according to every indexing guideline you'll ever see.

  So what happens when you want your See also reference to appear among your subentries? Well, first of all you'll need to misuse the \t flag syntax. Second, you'll need to override Word desire to sort the word See as a real word and not as part of a cross reference. The result is something like this, in which the cross reference will appear below all the alphabetic subentries:
     {XE "Washington, George:See also U.S. presidents" \t ""}
Notice that I still have an argument for the \t flag, even though it's an empty set. I'm not sure this is strictly necessary, but I think it's good for the sake of consistency. Also note that the example as it appears here does not use italics. You could, and you probably should if you're a fan of Chicago; I didn't here for the sake of readability.

  The next problem you'll have is that Word doesn't know that when you have more than one cross reference, they need to be combined. Remember, Word is looking at these things as if they're text, not actual and meaningful elements of indexing. So if you have a series of cross reference targets, you'll need to type them into a single index entry, using escaped colons:
	{XE "Washington, George:See also U.S. presidents\; Washington, Martha" \t ""}


  There is one good thing about the syntax of cross references when you're embedded your index data, but it's not something unique to Word: You can put your cross references anywhere in the documentation. Although it's usually a good editorial idea to keep your cross references near the text of the cross-reference target, it's a good production idea to keep all of your cross references in one place, such as within the preface. Having all your cross references in one place is also useful for language control.

My Index Suddenly Vanished

  If your entire index vanished without a trace, consider first that your problem has nothing to do with indexing. (This is sort of like wondering why your computer doesn't work, when in fact the monitor is unplugged.) Are you looking at the right file? Did you delete the index, either by accident or with the intention of starting over? Did you insert more text or other content after the index, so that your index exists but is no longer at the end of the document where you expected it? Are you sure you ever had an index, and that you're not remembering a different file or circumstance?

  However, there are two index-related things you could have done. First, look to see if there's an {INDEX} field code where the index used to be. If so, then you do have an index, but it's not being displayed. You can toggle (switch on/off) between seeing the {INDEX} code and the index itself by right-clicking on what you see and selecting the "Toggle field codes" option. By allowing you to see the {INDEX} code, in theory the application is allowing you to manually make some syntax changes to the index without having to go to the menus. (To learn about the {INDEX} code, see my word flagsopens in new window document.

  On the other hand, if you can't see the {INDEX} code, it's possible that you're hiding all your codes. Make your hidden text reappear by clicking on the paragraph button, or selecting View > Show All. This is what you'd have to do to make your {XE} fields visible as well.

My Index Entries Suddenly Vanished

  If all of your entries are suddenly invisible, it's probable that you simply hid all your formatting codes. Click on the toolbar button with the paragraph symbol on, or select View > Show All from your menu. That should make them reappear.

  If making your codes visible didn't make your index entries appear, then you did something non-index-related that made them vanished. The most likely culprit is that you saved your document in a text-only format, or some other format that doesn't have index tags, causing them to be deleted. Alternatively, you might have cut-and-pasted them all away globally, on purpose or by mistake. After that, who knows? Perhaps you're in the wrong document, or maybe you only thought you had index entries, when in fact you never did.

My Continued Lines Vanished

  If you inserted continued lines into your index, you did so manually, after you had a generated, editable index. However, manual changes are just that: manual. If you regenerated your index at any time after you made those manual changes, all of your manual changes disappeared. If you need to apply edits to the index manually, you MUST make sure not to implement these changes until the index is finished, never to be regenerated again. This applies not just to additions like continued lines, but also to spelling corrections, punctuation changes, and formatting changes.

  (What's a continued line? Read my instructional document on line, column, and page breaksopens in new window.)

Can I Delete All My ___ Entries?

  Every now and then, there's nothing you want to do more than globally delete a bunch of entries. The problem is how this is supposed to happen. For example, suppose you have a common main entry for "publicity," when you decide that you're better off with a cross reference like "publicity. See marketing." In addition to creating this cross reference, you need to remove all of your original publicity entries. Although you can search for marker text, you can't search for whole markers. In other words, you can search for the word "publicity" when it's used within index markers (look for hidden text), but you can't search for a whole marker like {XE "publicity"} or {XE "publicity:methods for"}. For this reason you can search globally and delete.

  The easiest approach to deleting all publicity entries is the manual approach: generate your index, then delete everything that starts with the word publicity. Unfortunately, manual edits will be undone as soon as you generate the index again; you'll have to remember that you want to make these manual changes every time you create a new version of the index. To help you remember to make these manual changes, I recommend changing the format and/or language for the word publicity to make sure it jumps out at you. Search for XE "publicity, the unique text for all publicity entries, and replace it with boldface, all caps, and a shocking color like red. I also recommend that you change the word publicity with something that will sort at the very beginning of your index, such as aaa DELETE ME. Now, when you generate your index, you'll see some red, boldface, all-caps reminder at the top of your index file. Hopefully this will be enough for you to remember deleting your entries.

  Another approach, and by far the one I prefer, is to replace the marker syntax with something that Word can't interpret. Instead of using the letters XE in your marker, use something like DELETE_ME. In other words, globally change XE "publicity with DELETE_ME "publicity. Since markers are hidden text, your DELETE_ME markers will remain hidden from publications; further, they'll fail to become index entries since Word won't interpret them as XE markers. The biggest advantage to this method is that it works globally, and you only have to make these changes once. Another advantage is that you aren't actually deleting the entry, just rewriting it; if for any reason you need to reconstruct entries, you can always change DELETE_ME to XE. (This is a kludgy way of creating conditional text, but it might be just what you need.) The disadvantage is that you're not actually deleting anything, potentially cluttering your documentation.

  As a side note, whenever you remove an entry from your index, remember that you have to delete any cross references that target those now-removed entries. For example, if you replace your publicity entries with "publicity. See marketing," you'll need to rewrite or delete entries like "public relations. See also publicity."

Auto-Marking: What Good Is It?

  As you can tell from the language of the question, I'm not fond of the automatic indexing features of Microsoft Word. In truth, my issue isn't that the feature exists. I have a problem with this feature being called "autoindex," as if computer-generated indexes were actually any good. They aren't. Computer-generated or automatic indexes STINK. I've had this discussion many times with many people, indexes and non-indexers alike, and we are all 100% in agreement. You can't build an index using computer logic.

  Some people seem to think that in very limited circumstances you can come close; they're wrong. Other people believe that if an index is "not too bad," that might be acceptable; again, they're wrong. The problems endemic to computer-generated indexes are not only serious, but they're nearly impossible to correct editorially -- unless you're willing to start over, that is. Additionally, the most serious problems facing automatically created indexes are invisible: important ideas that weren't indexed at all, trivial ideas that shouldn't be in the index but are, miscategorized ideas, badly combined ideas, and so on. Automatic indexes can look good, but their failures are much to serious to be ignored. If you know anything about indexing, or if you care about what your readers think, you should know that an automated index is actually worse than having no index at all. Readers will trust a bad index and get burned; having no index will at least give readers a fighting chance to find that information some other way, such as with the table of contents, by browsing, or simply from memory.

  So there's my question: What good is the automatic indexing feature of Word? The answer lies in understanding what this feature actually accomplishes: it finds every occurrence of something you want, and marks it with an index entry. This is NOT indexing, but in some situations you might still want to do this. (Use at your own risk.)

Teaching Auto-Mark to Ignore or Consider Certain Things

  I hope you read my previous entry on automatic creation of indexing entries. If you didn't, please scroll up (or click here) right now, before continuing.

  If you're going to use the auto-marking feature after all, for whatever reason (including reasons that aren't about indexing), you may quickly discover that Word doesn't pay very good attention to what you ask it to mark. Styles, formatting, capitalization, hyphenation, and even quotation marks seem to escape the notice of this already terrible feature. For example, if you need to search for all instances of a lowercase word like president, you may not find all occurrences of the uppercase word President, even when it is capitalized as the first word in a sentence. Additionally, auto-mark tries to be helpful my maintaining the case of what you mark, which means you'll get some entries with lowercase president and some entries with uppercase President, and these won't combine nicely in your generated index. The only circumstance in which these things don't matter (and auto-marking actually makes a tiny bit of sense) is when you're creating a name index. However, to build a real name index you'd have to auto-mark every single name that appears in the whole book -- and even then, it won't work unless each name appears in a unique way throughout the book.

  So how do you get around the picky nature of auto-marking? Simple. You need to globally search-and-replace either (1) what you want to auto-mark or (2) what you want auto-mark to ignore, before you attempt to use the auto-mark feature. For example, suppose you want to create an {XE} field for every italicized occurrence of the word missing, but you don't want to create an {XE} for the word when it appears in roman typeface (i.e., not italics). Before you run auto-mark, find every italicized occurrence of missing and replace it with something unique like ITALmissing. Then run auto-mark, looking for the invented term ITALmissing. You'll get fields that look like {XE "ITALmissing"}. As a last step, globally replace all occurrences of ITALmissing with the original, italicized missing. Now your text is back to the way it was, and your index entries match.

  Another problem is with quotation marks. Auto-mark doesn't work with quotation marks, probably because the {XE} fields themselves use quotation marks. So, globally replace all quotation marks with terms like QUOTEHERE or DOUBLEQUOTEHERE before using auto-mark. Then, if you were hoping to create index entries for things like "Hogan's Heroes" (which has an apostrophe that would otherwise break the feature), you can auto-mark for "HoganAPOSTROPHE_HEREs Heroes" to get your {XE} fields. However, you need to be extra careful with quotation marks! You must not globally replace your QUOTEHERE marks with quotes. That's because a quotation mark in an {XE} field will mess things up. Instead, you're going to need to replace the QUOTEHEREs in your index fields differently than the QUOTEHEREs elsewhere, which means you require two final steps, not one. First, search for QUOTEHERE and APOSTROPHE_HERE as hidden text; since the {XE} fields are hidden, this search will find only terms in index entries. Replace these occurrences with not with a quote mark, but a quote preceded by a backslash: \" or \'. That backslash tells Word that the symbols that follows -- the quotation mark or apostrophe -- should be treated literally, and not as part of the {XE} field syntax. Then, as a second step, you can globally replace any remaining QUOTEHERE-like terms with your quotes. One last warning about quotation marks is that depending on your settings, curly quotes might not search-and-replace as cleanly as you'd expect. While a non-curly quotation mark will find both opening and closing curly quotation marks when you're searching, replacing them may not auto-correct as you'd want them too for every occurrence.

  An alternative approach to those two final steps for putting your quotation marks back is the following. First, hide your index tags by making all hidden text invisible. (You do this in the same way you make your paragraph symbols disappear.) Second, replace all QUOTEHERE-like terms to quotation marks. Third, make your hidden text visible again. Fourth and finally, replace all QUOTEHERE-like terms with the backslashed versions, as described above.

  The quotation mark is a special character because it's used in the index marker. The other special characters that require a backslash are the colon, the semicolon, and the backslash itself. If you needed to auto-mark terms that have a backslash in them, your index entries will need two backslashes to represent just one. (If this seems really weird to you, that's only because you've never been a programmer. I guarantee that the programmers who are reading this paragraph are nodding to themselves.)

  You can use this same basic technique for anything you want, as long as you can find a way to isolate and group the terms that you want to index from the terms you don't. If you want to find all occurrences of president and President, but you don't want your {XE} fields to use the capital P, you can replace all capital-P Presidents with something like CAPPresident and then auto-mark them. Then replace all hidden-text CAPPresident terms with the lowercase-P version, and all non-hidden-text CAPPresident terms with the uppercase-P version. Then automark the lowercase-P presidents as you would normally. You might think that it would be easier to search-and-replace your index after you've created these two types -- for example, you might create {XE "President"} and {XE "president"} tags first, and then try to replace "P with "p globally -- but you run the risk of changing things you didn't want to change. After all, "P can appear in your text, too.

  One last option, which is quite fast, is to keep all those QUOTEHERE-like terms in your index tags forever, but simply remove or replace them from the final generated index. Although I don't recommend this, this approach makes a lot of sense if you're going to be converting your Word documents into other tools. Just as the quotation mark is one of the few special characters in the {XE} field syntax of Word, remember that there are special characters in the other tools as well. If your index entries are going to survive all of those conversions, you might need to pre-treat the characters that are considered special by those other applications. For example, Adobe FrameMaker index markers can also have bracketed content; square brackets and angle brackets in your Word document might fail you once you convert into FrameMaker.

Can I Create an Index With Something Other Than Page Numbers?

  For one-to-one lists.
It's not uncommon for people to use Word's indexing feature to create something that's not really an index. For example, many authors attempt to use the those {XE} fields as a way of creating things like tables of figures. Using [XE} fields is most certainly the wrong way to do this -- by design, that is -- but a lot of Word's features are pretty ugly. So if you're familiar with indexing, why not? Also, in some cases you want to create a list of items where what appears in the list doesn't precisely match what appears in the documentation. For example, if you wanted a list of figures where the figure number appears last, what you want to do is create fields like {XE "TITLE OF FIGURE"}, but replace the page numbers with locators like "Figure 1.1".

  For a better result I recommend the method described later for section numbers, but a faster and possibly simpler method is to copy your captions into a new document, and then place each figure (or listed item) on a separate page. Make sure all your page numbers match your figure numbers (or whatever), and run the index. For example, you could number your pages 1.1 through 1.12 for the first twelve figures, 2.1 through 2.12 for the next twelve figures, or whatever. Put one figure on each page, such that Figure 2.2 is actually on page 2.2. When you run the index, Word will use these artificial page numbers, which is what you want. Now just make sure you put your index entries on the appropriate pages, using whatever text you want.

  The danger of this method, despite its ease, is that you must now manage two documents: your original text, and your figures. If you are going to add, delete, move, or renumber any of your figures, your index won't be accurate.

  For many-to-many lists.
Here's an example that's quite similar to the one-to-one method, answering a question I received by email: "We are publishing a bibliography that will consist of dozens of numbered paragraphs. I intend to create an index at the end of the bibliography so the reader can search alphabetically for the authors. I know that the index can list the page numbers on which the indexed names appear, but can it list the numbered paragraphs instead? (In other words, rather than see that “Doe, John” appears on pages 116, 117, and 120, can the index show me that his name appears in bibliography entries 24, 32, and 40?)" The difference between this example and the figure example above is that the person asking this question is interested in having multiple entries for each entry. Where the list of figures has one entry per figure, this author index will do more than create multiple locators for each name; it will also allow multiple entries for each reference. This many-to-many mapping is exactly what indexes are supposed to do, making this is a GREAT example of why {XE} fields are the right tool for this job.

  One solution to this author index challenge is to make sure that each biblographic entry appears on its own page, and that the pages match the items. Thus bibliographic items 24, 32, and 40 should be on pages 24, 32, and 40 respectively. Create your index entries for each item, one entry for each name (as you would want it to appear), and don't do anything unusual; even though your pagination is weird, your indexing shouldn't be. Generate your index as you would normally. In a sense, you are tricking Word into using your bibliographic record numbers because you are making sure your page numbers match them exactly.

  This many-to-many product is easier than the one-to-one example above for a few reasons. First, you don't have to make sure that this secondary document matches your original documentation perfectly. As long as the authors names haven't changed (e.g., been misspelled then corrected) and numbering remains the same, you're all set. But if you're going to be adding, deleting, or reordering items within the bibliography, or if your author names are likely to change, you should wait for something more complete before generate this index. Second, you aren't working with funky multipart page numbers like 1.1, which require some effort to produce. Instead, you're just counting, starting with 1.

  For indexes that use section numbers for locators.
A totally different kind of documentation product that doesn't use page numbers is one that uses paragraph or section numbers, like 4.1.1.3. You can't use Word's indexing feature naturally to create these, because Word will grab page numbers, not section numbers. And you can't use the table of contents feature because (a) you can't have multiple entries pointing to the same place and (b) you want your items sorted alphabetically, among other reasons.

  If you're not going to build a page numbering system that exactly matches your section numbering system -- and you shouldn't -- then you need to hijack the indexing system entirely. You need to create index entries that uses the section numbers as if they were entry text, and then perform some search-replace magic on the result to build what you intended to build.

  For example, imagine that all of your index entries were of the form {XE "main entry:optional_subentry:optional_subsubentry:sectionnumber"}, like this:
   {XE "indexing:with Microsoft Word: :4.1.1.3" \t}
. Notice that this example has a sub-subentry of just a single space. Also note the use of the \t flag, which suppresses creation of any page number. Consequently, when you produce your index, you'll have all your locators (section numbers) appearing at a sub-sub-subentry level, but no automatic page numbers. Then, thanks to Word's styles, you can search for all instances of the Sub3 style (or whatever style is appled to sub-sub-subentries), and make the magic changes to produce an index that looks like this:
    main entry
        with Microsoft Word, 4.1.1.3
To do this properly you'll need to (1) globally search for Sub3 styles and prepend a word like REMOVEME, plus a comma and space; (2) remove all space-only entries; (3) delete REMOVEME plus the preceding paragraph marker to run up the locators. Thus you'll go from uncombined entries like these:
    main entry
        with Microsoft Word

                4.1.1.3
                4.2
                10.1.9
to
    main entry
        with Microsoft Word

                REMOVEME, 4.1.1.3
                REMOVEME, 4.2
                REMOVEME, 10.1.9
to
    main entry
        with Microsoft Word
                REMOVEME, 4.1.1.3
                REMOVEME, 4.2
                REMOVEME, 10.1.9
and finally to
    main entry
        with Microsoft Word, 4.1.1.3, 4.2, 10.1.9


  Now there is still one remaining problem, and that's what happens when you use section numbers that move into double digits. Word likes to sort numbers by their digits and not their numeric value, which means you get sequences like this: 15, 5, 501, 52. With section numbers, you can end up with garbage where 10.1.11 comes before 10.1.2. To stop this from happening, you need to add zeroes in front of all one-digit numbers, so that everything is a two-digit number. (If you have and three-digit numbers, you'll need to add even more zeroes.) So instead of using 10.1.11, you'll use 10.01.11. And then, when you've generated the index, you can search for all occurrences of ".0" and replace them with just ".", essentially dropping all those extra naughts.

Request for Comments and Other Feedback

  Keeping up with software that is frequently re-released can be difficult, although to date Microsoft has awarded very little attention to indexing tools and functionality. (They seem to think indexes aren't important enough in the grand scheme of things, can you imagine? :-) To maintain this document, then, I'd appreciate your help. Email me with corrections, clarifications, anecdotes, and your special tips and tricks. I'll append them to this document. My email address is seth@maislin.com. Thank you.

 

Copyright 2006-2007 Seth A. Maislin

Top


HOME  |   ABOUT  |   INDEXING  |   WEBSMARTS  |   FUN & WACKY  |   EMAIL
Site design by little graphics studio.
© 2002   All rights reserved.