Web Indexing Chat (Transcript)

home > websmarts

Web Indexing Chat (Transcript)

An indexstudents chat
(transcribed for publication by Seth Maislin)

Editor note: Core contents have not been edited, although spelling has been corrected and conversation order is occasionally adjusted for clarity. Readers are reminded that chat participants speak at a steam-of-consciousness rate.

Host: Dan Connolly ("wfwbooks")
Special guest: Seth Maislin ("smaislin17")

	wfwbooks: Welcome to the Web Indexing and Search Engines chat by Seth Maislin. Seth is a great chatter, so look forward to information and entertainment. Without further ado...
	wfwbooks: Here's Seth.
	smaislin17: Hello all! Let me start by asking you folks if there's anything you'd like to talk about?
	thegourmetcat: For those of us beginning, an "official" definition of web indexing?
	smaislin17: Well, there is no official definition, as you can imagine. The Web (as we know it) has been around for maybe 10 years, but as technologies go we're still very much at infancy. One definition, which seems rather common among ASI members, is the creation of a back-of-the-book-looking index on the Web. Whether this is an index of a website, of several websites, or of something that isn't on the Web at all (like a book) doesn't matter; the bottom line is that the index is alphabetized, indented (as best as possible), and uses locators. For the Web, of course, those locators are rarely page numbers, but rather web addresses (URLs). So many people see web indexing as the parallel to "paper indexing," if you know what I mean.
	smaislin17: Personally, I don't like this definition at all. I'm not against index-like documents on the Web, but I'm against limiting the idea of indexing to something so particular and, honestly, peculiar. I prefer to use the "indexing" as some combination of two different existing concepts: the index you see in books, and the term "indexing" when used for databases. (Database indexing refers to identifying specific database records with unique identifiers.)
	thegourmetcat: Peculiar?
	smaislin17: Yes, peculiar. :-) I'll try to explain why I think that in a little bit.
	luciemh: I'm especially interested in how you see indexers getting into web indexing. How do they go about learning how to index for web sites and how do they find potential clients?
	smaislin17: luciemh, that's a great question, and I will address that next.
	smaislin17: Anyway, I think that the index as it appears in a book -- indented, commonly alphabetized, etc. -- is the one design that grew as an optimization of the printed medium. That is, because books are printed and bound, indexes look like they do. But before you can start looking for work doing web indexing, you have to decide what exactly you're going to DO as a web indexer. If all you want to do is create those back-of-the-book-looking documents on the Web, you'll need to focus on sites for which those documents actually add usable value. But for most websites, I'm of the belief that this book-optimized index architecture doesn't work very well online.
	thegourmetcat: I've certainly found enough websites that would be more usable if they had an index, though.
	smaislin17: I think you're right, gourmetcat. The index is a useful tool no matter what. But in an online environment, there are a lot of other representations of indexing skills that have greater value. I would argue that for most websites, if an index is required, there's something inherently wrong in its design and structure.
	smaislin17: Let's make a comparison. Books have this thing wrong with them: they're comprised of many, many undifferentiable pages of words. The only way to get in there is to have some additional tool, whether a table of contents (TOC), index, physical tabs, and so on. Otherwise there is no good way to find something in a book, period. Consider a fictional novel, where even the narrative of the story doesn't provide enough clues to help you remember where a concept might have been introduced. But websites can have multiple internal structures, multiple categorizations. Even without a search engine or an index, websites are implementationally multidimensional. (Books are, for the most part, linear.) So a website that has excellent structure doesn't need an index -- whereas a book, no matter how well written, should have one.
	thegourmetcat: I'd think the material on the website would kind of determine the usefulness of an index, as well as the structure of the website.
	smaislin17: Exactly, cat. I'll give you an example.
	smaislin17: Consider Amazon.com. This website, by its nature, doesn't require an index. Here's why. First, it's modeled after a physical bookstore, which people kind of understand. Second, books are already organized by author and title, as well as basic well-understood genres like "self-help." Third, genres are easily subdivided, just as Travel is divided into countries, or Fiction is divided into Romance and Thriller. And so on.
	thegourmetcat: Kind of a "walk-through" index.
	smaislin17: Yes, exactly. And what makes an Amazon.com index unnecessary is that these structures are incorporate into their website. From the home page you can choose the title (alphabetically, with search to speed things up), the author (also searching), the genre, the category, and so on. There are all sorts of other double-postings available, since you can find a book by price, or by looking among bestsellers. These structures -- which you would hope appear in a printed catalog of book titles -- are built into the web system. This is the multidimensionality I spoke of.
	smaislin17: What indexers are good at, however, is deciding what these structures can be. If you know how to organize fish by species, or people by interests, or artistic accomplishments by time period and artist influence, then you know how to build websites on these subjects. And if you do your job well, you don't necessary need the index-looking thing. BUT, some websites don't lend themselves to this kind of breakdown, don't really need it, or are too complicated to break down neatly.
	smaislin17: For example, imagine a website about art history. It's full of text, and the ideas are hard to articulate, especially for visitors who don't know the art language very well, or tend to confuse various concepts. Lots and lots of text, not a lot of structure... gosh, sounds a bit like a book! For a text-heavy site like this, a standard index would be extremely valuable. You need a way to connect visitors to the concepts they want, despite their ignorance of the language. And you need to connect similar concepts throughout the site, despite your inability to "thematically" break down the website into a simple navigational structure. Instead, you design the site with categories for easy-to-understand periods in art (such as Romantic) or media (such as pastel), and leave the rest to the educated indexer.
	smaislin17: No different from a textbook, really.
	smaislin17: So I hope I can summarize my first point: that index-looking documents for websites make the most sense when the website is text-heavy, structure-barren, and (though I didn't say this before, it's still important) relatively unchanging over time. This last point is worth emphasizing: In practical terms, indexes don't scale. Would you want to index every news article flowing into CNN.com? Are you that crazy?
	smaislin17: So let's answer that question about what online indexing is, because I've already answered it (sort of). Online indexing, in my very loose definition of the term, is about making a site more accessible through categorization, vocabulary control, and the combination of related concepts. I also believe that online indexing uses words (labels) as its tools, and not other design features like colors, icons, and buttons. Indexing is about identifying how "wanted" concepts are likely to be found by those who want them. Once identified, the content can be structured into categories, linked to a list or table of labels, searched against a list of keywords, or simply discovered by accident. One such tool is the A-to-Z list. Another such tool is the TOC. These are very different, though they're closer in implementation online than they are in a book. TOCs represent content in a globally structured way, just as books put Chapter 1 in front of Chapter 2 for some important reason. Indexes represent content in a local way, providing multiple words to get to what you want. Online indexing, therefore, can include book-like indexes, but it can also include "keywording" (one of those new verbs) and information architecture.
	mjslaton2000: What is information architecture?
	smaislin17: To learn more about information architecture, I recommend the textbook by Rosenfeld & Morville entitled Information Architecture for the World Wide Web, now in a second edition. Fred Leise wrote the index. IA is a hard term to define succinctly, so I'll have to leave your question unanswered for now, mjslaton. I strongly recommend browsing this book, available in most large bookstores. But basically, I think of information architecture as the equivalent of indexing in the Web world. I think they're almost synonyms, actually. IA involves things like structure, organization, labels, and navigation. In fact, "navigation" (and with that, search) is the only thing never really discussed in book indexes, and perhaps the one reason the terms aren't exactly synonymous.
	smaislin17: So, with all these huge comments floating around, I'll pause for questions again.
	(identifier accidentally unrecorded): How are access points coded?
	joanhgreen: And how do online indexers interface with Web designers?
	smaislin17: Good questions. Regarding coding, if you're actually talking about what the HTML code would look like, that's a beyond-the-scope-of-this-chat question. If you're talking about how they're represented in the final product, however, that's important. It's about labeling, really. If you're replacing the numeric page numbers in a book-like index with nonnumeric hypertext labels, formatting is a major issue. I have an article online that talks about that a bit; I'll post the address in a little bit.
	smaislin17: But as I said way at the beginning, the Web is still at its infancy. Did you know that print indexes (as we know them) were invented around 1800, but it wasn't until around 1850 that it was decided entries should be alphabetized as a standard? I like that idea, because it meant people were considered alternatives that, after the 150 years since, are hard to imagine. The Web is no different. Alphabetization is up for grabs again. Consider, for example, the order of the items in the File menu in most Windows-based or Mac applications. NEW is at the top, EXIT is at the bottom. Why aren't they alphabetized?
	smaislin17: Most of the Web is task-oriented right now; it's about helping people DO things. So task-oriented sorting is quite common. So is importance-oriented sorting, although that's subjective and tricky. It's like putting "About Us" at the top or at the bottom; some people think it deserves to mentioned first, and others thing it can wait until the end.
	smaislin17: My point is that there are no standards. Alphabetization is up for grabs. Indentation is up for grabs. Page numbers don't exist. What's left? We have no choice by to decide for ourselves how we want our indexes to appear.
	joanhgreen: Aren't marketing sites all about exposure?
	smaislin17: Marketing sites are about exposure, but they still have to decide what they want to expose most. If it's about personality and "heart," they'll probably put the CEO's face on the home page. If it's about branding, they'll make sure the whole site looks like its logo. If it's about "solving our customers' needs," they'll explain what they can do long before they talk about who they are. As indexers -- this is in response to joanhgreen's question about interfacing -- our strengths are in putting ideas into categories, or in defining categories very particularly. We're good at connecting similar concepts, distinguishing between dissimilar concepts, and finding labels for everything. When it comes to web design, then, you want to market yourselves as language and category experts. The industry buzzwords right now include hierarchies, taxonomies, classification and organization schemes, labels, and user focus. You'll find all of these concepts in the Rosenfeld/Morville text.
	smaislin17: I've had a contract working with an application developer, and my job was to improve the search system of the application by building categories everyone would be able to use without training. I've also done work for a publishing company that was interested in routing documents from one group of people to another, and they needed an "intermediate categorization system" to make sure both groups were speaking the same language. I've also done work where I've helped companies store their internal information in a useful set of folders, so they wouldn't reproduce work they didn't know existed. (The buzzword here is knowledge management.) And this brings us, finally and certainly, to the question of how you can market yourself.
	smaislin17: You have to identify what your skills are. What are the things that you do best as an indexer? (An analogy is the stay-at-home parent who is finally looking for a job but doesn't have a resume, so he/she writes "managed the schedules of as many as 20 individuals for special events" to refer to birthday party planning.) You're good at organization, labels, categories, and user focus. That's how you'll promote yourself.
	smaislin17: OR, if you like the alphabetized indented index-looking thing too much to give it up just yet, then you need to make sure you're approaching the websites for whom such a product would be useful. Don't contact Amazon or Verizon or Apple. Go for the academic sites that have a ton of information on them. Find the research sites with tons of information. And stick to research that (a) you enjoy, (b) you can understand, and (c) don't change so quickly that maintaining the index is too expensive to be worth it.
	smaislin17: By the way, let me throw in a disclaimer. These are my ideas, and mine alone. Web indexing is still badly defined. I say, call it what you will.
	thegourmetcat: Did I miss the URL for your article?
	smaislin17: Right, that article. You can get to the article from my website, or go directly to https://taxonomist.tripod.com/websmarts/onlineindexing.html.
	joangreen: Thanks. This is a lot to think about.
	smaislin17: It is a lot, rather overwhelming. I think that's the best time to get involved, though, when nobody knows what they're doing except the experts (that's you!).
	thegourmetcat: But a really exciting place for indexing to go. One of the appealing things about indexing, to me.
	smaislin17: I agree. And of course, it's already going there. Indexers were more prominently mentioned in the second edition of Rosenfeld/Morville.
	smaislin17: I encourage you folks to stick with indexing, stay involved with indexstudents (even when you don't think of yourselves as students any more), and ASI (or your own national indexing association). Give this whole "web indexing" thing a try. Whoever controls the information wins, but it's not about hoarding information so much as about sharing it. Thankfully, we have the skills.
	smaislin17: And with that, I want to thank everyone for coming, and I look forward to seeing your posts on indexstudents in the future. ...Now get back to work. :-D

Top