P1: 61
WL040C-197-Quin WL040/Bidgoli-Vol III-Ch-56 June 23, 2003 16:38 Char Count= 0
692 WEBCONTENTMANAGEMENTFigure 4: Portion of the workforce ontology.Once the data model stabilizes, it needs to be trans-
lated into a structure with more details. Such a structure
is sometimes called “knowledge structure” or “ontology”
(not in the sense of philosophical studies of the world ex-
isting, but in the sense of representing the real world).
Depending on the encoding language and software used,
the same knowledge structure may appear differently.
Figure 4 shows a portion of the ontology built from the
data model in Figure 3. Theprojectclass in the workforce
domain has a set of properties, each of which has name,
type, constraints (cardinality), and other facets if the prop-
erty type is symbol, class, or instance. For example, the
property “project sponsor” uses “instance” as the property
type. The class for property type “instance” is “Organiza-
tion,” from which instances will be extracted as the values
for project sponsor. Organization as a class has its own set
of properties. When it is used in another class (in this case,
the project sponsor property in the project class), the class
organization and its properties can be conveniently refer-
enced and reused, thus saving time from redeveloping the
same knowledge structure and improving consistency in
representing the same class in different locations within
the domain. The template properties provide a framework
for building schemas that will be used to design interac-
tive Web forms to capture data. Using the example in Fig-
ure 4, we can easily produce a database schema or an
XML schema that can be used to create XML instance
documents.Controlled Vocabularies
Controlled vocabularies are tools for content classifica-
tion and indexing. Taxonomies, thesauri, classification
schemes, and glossaries are the forms of controlled vocab-
ulary frequently used in categorizing and indexing con-
tent. Taxonomy is a term borrowed from biology. It im-
plies three meanings according to WordNet (Princeton
University Cognitive Science Lab, 1998):(1) a classification of organisms into groups based on sim-
ilarities of structure or origin;
(2) a study of the general principles of scientific classifi-
cation; and
(3) the practice of classifying plants and animals accord-
ing to their presumed natural relationships.Taxonomies provide a method for structuring and classi-
fying things—living organisms, products, subjects—into aseries of hierarchical groups to make them easier to iden-
tify, study, or locate (Montague Institute, 2002). For digital
content, taxonomies play two roles: as category labels for
indexing content and as a tool for searching, browsing,
and navigation. Category labels may be assigned manu-
ally by human content/metadata creators or automatically
by using computer programs. The automatic approach
often involves categorizing sample documents manually
and then uses the manual result to train the system to
classify other documents automatically. Properly classi-
fied content will allow an organization to inventory and
monitor the Web content based on a structured under-
standing of user and community needs.
Taxonomies may be generated through automatically
extracting, analyzing, and categorizing structured and un-
structured content. The key question is how to extract dif-
ferent types of content and categorize them. Approaches
commonly used in taxonomy development include sta-
tistical and symbolic methods (Parsons & Wand, 1997).
The statistical (similarity-based) approach uses similar-
ity measures to classify real world objects, which can be
done automatically through computer programs. It is less
labor intensive and time consuming compared to man-
ual work, but also less likely to produce the organiza-
tional schemes the way people would. The symbolic (goal-
oriented or explanation-based) approach relies on domain
knowledge to determine the classes. Semantic networks
(Woods, 1991) are examples of the symbolic approach. No
matter which approach one uses, it is a good idea for tax-
onomy development to start with existing reference books
and encyclopedias in the domain. The benefit is that using
existing sources creates a familiar taxonomy that exhaus-
tively covers a topic.
In the example as shown in Figures 3 and 4, the tax-
onomy for workforce services used a symbolic approach.
By collecting concepts and terminology from workforce
documents and reference sources, the initial taxonomy
produced a three-level hierarchy of classes for the con-
cept Project:.Project
Activities
Case management
Community audits
Continuous improvement
Cross training
Employer focused programs