P1: c-143Braynov-2
Braynov2 WL040/Bidgoli-Vol III-Ch-05 July 11, 2003 11:43 Char Count= 0
54 PERSONALIZATION ANDCUSTOMIZATIONTECHNOLOGIES
Products that the neighbors like are then recommended
to the target user. In other words, CF is based on the idea
that people who agreed on their decisions in the past are
likely to agree in the future. The process of CF consists
of the following three steps: representing products and
their rankings, forming a neighborhood, and generating
recommendations.
During the representation stage, a customer–product
matrix is created, consisting of ratings given by all cus-
tomers to all products. The customer–product matrix is
usually extremely large and sparse. It is large because
most online stores offer large product sets ranging into
millions of products. The sparseness results from the fact
that each customer has usually purchased or evaluated
only a small subset of the products. To reduce the di-
mensionality of the customer–product matrix, different
dimensionality reducing methods can be applied, such as
latent semantic indexing and term clustering.
The neighborhood formation stage is based on comput-
ing the similarities between customers in order to group
like-minded customers into one neighborhood. The simi-
larity between customers is usually measured by either a
correlation or a cosine measure. After the proximity be-
tween customers is computed, a neighborhood is formed
using clustering algorithms.
The final step of CF is to generate the top-Nrecommen-
dations from the neighborhood of customers. Recommen-
dations could be generated using the most-frequent-item
technique, which looks into a neighborhood of customers
and sorts all products according to their frequency. The
Nmost frequently purchased products are returned as a
recommendation. In other words, these are theNprod-
ucts most frequently purchased by customers with similar
tastes or interests. Another recommendation technique is
based on association rules. It finds all rules supported by
a customer, i.e., the rules that have the customer on their
left hand side, and returns the products from the right
hand side of the rule.
Content-Based Filtering
Another recommendation technique is content-based rec-
ommendation (Balabanovic & Shoham, 1997). Although
collaborative filtering identifies customers whose tastes
are similar to those of the target customer, content-based
recommendation identifies items similar to those the tar-
get customer has liked or has purchased in the past.
Content-based recommendation has its roots in informa-
tion retrieval (Baeza-Yates & Ribeiro-Neto, 1999). For
example, a text document is recommended based on a
comparison between the content of the document and a
user profile. The comparison is usually performed using
vectors of words and their relative weights. In some cases,
the user is asked for feedback after the document has been
shown to him. If the user likes the recommendation, the
weights of the words extracted from the document are
increased. This process is called relevance feedback.
However, content-based recommender systems have
several shortcomings. First, content-based recommenda-
tion systems cannot perform in domains where there is
no content associated with items, or where the content
is difficult to analyze. For example, it is very difficult to
apply content-based recommendation systems to product
catalogs based solely on pictorial information. Second,
only a very shallow analysis of very restricted content
types is usually performed. To overcome these prob-
lems a new hybrid recommendation technique called
content-boosted collaborative filtering has been proposed
(Melville, Mooney, & Nagarajan, 2002). The technique
uses a content-based predictor to enhance existing user
data and then provides a recommendation using collabo-
rative filtering.
In general, both content-based and collaborative filter-
ing rely significantly on user input, which may be sub-
jective, inaccurate, and prone to bias. In many domains,
users’ ratings may not be available or may be difficult to
obtain. In addition, user profiles are usually static and
may become quickly outdated.
WEB USAGE ANALYSIS
FOR PERSONALIZATION
Some problems of collaborative and content-based
filtering can be solved by Web usage analysis. Web us-
age analysis studies how Web sites are used by visitors in
general and by each user in particular. Web usage analysis
includes statistics such as page access frequency, common
traversal paths through a Web site, session length, and top
exit pages. Usage information can be stored in user pro-
files for improving the interaction with visitors. Web usage
analysis is usually performed using various data mining
techniques such as association rule generation and clus-
tering.
Web Usage Data
Web usage data can be collected at the server side, the
client side, or proxy servers or obtained from corporate
databases. Most of the data comes from the server log
files. Every time a user requests a Web site, the Web server
enters a record of the transaction in a log file. Records are
written in a format known as the common log file format
(CLF), which has been standardized by the World Wide
Web Consortium (W3C). The most useful fields of a CLF
record are the IP address of the host computer requesting
a page, the HTTP request method, the time of the transac-
tion, and the referrer site visited before the current page.
Although server log files are rich in information, data
are stored at a very detailed level, which makes them dif-
ficult for human beings to understand. In addition, the
size of log files may be extremely large, ranging into gi-
gabytes per day. Another problem with server log files is
the information loss caused by caching. In order to im-
prove performance most Web browsers cache requested
pages on the user’s computer. As a result, when a user re-
turns to a previously requested page, the cached page is
displayed, leaving no trace in the server log file. Caching
could be done at local hosts and proxy servers.
Web usage data can also be collected by means of
cookies containing state-related information, such as user
ID, passwords, shopping cart, purchase history, customer
preferences, etc. According to the W3C cookies are “the
data sent by a Web server to a Web client, stored locally
by the client and sent back to the server on subsequent