Why a hierarchical index?

Why a hierarchical index?

Because there isn't a good one available yet. Existing attempts at providing an index based on a list of subject headings thus far have been very shallow, having only one or two levels.

Wouldn't another kind of index be better?

For some kinds of searches, yes. There are already many different keyword and subject-heading based indexes to sources on the internet. The problem with these indexes is that knowledge of keywords are needed by the person doing the search. In a traditional library setting, reference librarians can provide help to patrons in choosing the appropriate terminology. This is less true on the Internet.

So how does a hierarchical index solve the keyword ignorance problem?

By having a hierarchical structure, browsing the subject headings becomes possible. It is assumed that the person doing the search will have some knowledge of how their search targets fits into the over wealth of human knowledge. For example, if someone is looking for a C++ compiler, they will try to find a reference to it in the areas of knowledge related to technology and then computers, rather than religion.

Why not pick Dewey or some other well-established classification system used in libraries?

The problem with Dewey and Library of Congress subject headings is that they are all "a mile wide and an inch deep." Also, they don't closely match the sort of divisions that a domain expert would use, except in certain circumstances.

If some system isn't chosen, won't this lead to chaos?

The indexers on this project are strongly encouraged to use pre-existing classification schemes, such as the ACM Computing Review Classification Scheme, or the Encyclopedia of Social Sciences scheme.

Shouldn't the index have classification codes that are recognizable by someone from library science so that experts from the library field can quickly find their way around?

This would be a great addition. However, since this project is to have a distributed management model, we have assumed that it would easier if domain experts who are responsible for indexing a topic area chose a classification scheme that works best for their field. If this happens to match LC or UDC, fine. If someone wants to use a traditional subject heading classification to classify the sources on the Internet, they are welcome to do so.

What sources are being indexed?

Roughly, anything that can be reached through the internet. This excludes a lot of course, since, contrary to poular belief, most information is not available online. For exmaple, bibliographic references for a particular subject area are not being collected by this project. However, if a bibliography is available on the internet for a particular subject area, this will be classified. It has been suggested that online card catalog systems that are reachable from the internet that have strong holdings in a particular subject area be classified. I think this is a great idea, but is outside the scope of this project. Another area that is not being classified is mailing lists which are "closed" in the sense that a potential subscriber must be known to the list maintainer or must provide some sort of justification why they should be added to the list.