The Internet Encyclopedia (Volume 3)

P1: c-143Braynov-2

Braynov2 WL040/Bidgoli-Vol III-Ch-05 July 11, 2003 11:43 Char Count= 0

54 PERSONALIZATION ANDCUSTOMIZATIONTECHNOLOGIES

Products that the neighbors like are then recommended to the target user. In other words, CF is based on the idea that people who agreed on their decisions in the past are likely to agree in the future. The process of CF consists of the following three steps: representing products and their rankings, forming a neighborhood, and generating recommendations. During the representation stage, a customer–product matrix is created, consisting of ratings given by all customers to all products. The customer–product matrix is usually extremely large and sparse. It is large because most online stores offer large product sets ranging into millions of products. The sparseness results from the fact that each customer has usually purchased or evaluated only a small subset of the products. To reduce the dimensionality of the customer–product matrix, different dimensionality reducing methods can be applied, such as latent semantic indexing and term clustering. The neighborhood formation stage is based on comput- ing the similarities between customers in order to group like-minded customers into one neighborhood. The simi- larity between customers is usually measured by either a correlation or a cosine measure. After the proximity between customers is computed, a neighborhood is formed using clustering algorithms. The final step of CF is to generate the top-Nrecommen- dations from the neighborhood of customers. Recommen- dations could be generated using the most-frequent-item technique, which looks into a neighborhood of customers and sorts all products according to their frequency. The Nmost frequently purchased products are returned as a recommendation. In other words, these are theNprod- ucts most frequently purchased by customers with similar tastes or interests. Another recommendation technique is based on association rules. It finds all rules supported by a customer, i.e., the rules that have the customer on their left hand side, and returns the products from the right hand side of the rule.

Content-Based Filtering Another recommendation technique is content-based recommendation (Balabanovic & Shoham, 1997). Although collaborative filtering identifies customers whose tastes are similar to those of the target customer, content-based recommendation identifies items similar to those the target customer has liked or has purchased in the past. Content-based recommendation has its roots in information retrieval (Baeza-Yates & Ribeiro-Neto, 1999). For example, a text document is recommended based on a comparison between the content of the document and a user profile. The comparison is usually performed using vectors of words and their relative weights. In some cases, the user is asked for feedback after the document has been shown to him. If the user likes the recommendation, the weights of the words extracted from the document are increased. This process is called relevance feedback. However, content-based recommender systems have several shortcomings. First, content-based recommendation systems cannot perform in domains where there is no content associated with items, or where the content is difficult to analyze. For example, it is very difficult to

apply content-based recommendation systems to product catalogs based solely on pictorial information. Second, only a very shallow analysis of very restricted content types is usually performed. To overcome these problems a new hybrid recommendation technique called content-boosted collaborative filtering has been proposed (Melville, Mooney, & Nagarajan, 2002). The technique uses a content-based predictor to enhance existing user data and then provides a recommendation using collaborative filtering. In general, both content-based and collaborative filtering rely significantly on user input, which may be sub- jective, inaccurate, and prone to bias. In many domains, users’ ratings may not be available or may be difficult to obtain. In addition, user profiles are usually static and may become quickly outdated.

WEB USAGE ANALYSIS FOR PERSONALIZATION Some problems of collaborative and content-based filtering can be solved by Web usage analysis. Web usage analysis studies how Web sites are used by visitors in general and by each user in particular. Web usage analysis includes statistics such as page access frequency, common traversal paths through a Web site, session length, and top exit pages. Usage information can be stored in user profiles for improving the interaction with visitors. Web usage analysis is usually performed using various data mining techniques such as association rule generation and clustering.

Web Usage Data Web usage data can be collected at the server side, the client side, or proxy servers or obtained from corporate databases. Most of the data comes from the server log files. Every time a user requests a Web site, the Web server enters a record of the transaction in a log file. Records are written in a format known as the common log file format (CLF), which has been standardized by the World Wide Web Consortium (W3C). The most useful fields of a CLF record are the IP address of the host computer requesting a page, the HTTP request method, the time of the transaction, and the referrer site visited before the current page. Although server log files are rich in information, data are stored at a very detailed level, which makes them difficult for human beings to understand. In addition, the size of log files may be extremely large, ranging into gi- gabytes per day. Another problem with server log files is the information loss caused by caching. In order to im- prove performance most Web browsers cache requested pages on the user’s computer. As a result, when a user returns to a previously requested page, the cached page is displayed, leaving no trace in the server log file. Caching could be done at local hosts and proxy servers. Web usage data can also be collected by means of cookies containing state-related information, such as user ID, passwords, shopping cart, purchase history, customer preferences, etc. According to the W3C cookies are “the data sent by a Web server to a Web client, stored locally by the client and sent back to the server on subsequent

The Internet Encyclopedia (Volume 3)

Get our desktop app

Company

Features

Documentation

Resources