The Internet Encyclopedia (Volume 3)

P1: c-143Braynov-2

Braynov2 WL040/Bidgoli-Vol III-Ch-05 July 11, 2003 11:43 Char Count= 0

OVERVIEW OFFILTERINGTECHNOLOGIES 53

together. Collective rule validation allows a human expert to reject or accept a large number of rules at once, thereby reducing validation effort.

Explicit Versus Implicit Profiling Data for user profiling can be collected implicitly or ex- plicitly. Explicit collection usually requires the user’s ac- tive participation, thereby allowing the user to control the information in his profile. Explicit profiling can take different forms. The user may fill out a form, take part in a survey, fill out a questionnaire, submit personal information at the time of registration, provide a ranking or rating of products, etc. This method has the advantage of letting the customers tell a Web site directly what they need and how they need it. Implicit profiling does not require the user’s input and is usually performed behind the scenes. amazon.com, for example, keeps track of each customer’s purchas- ing history and recommends specific purchases. Implicit profiling usually means tracking and monitoring users’ behavior in order to identify browsing or buying patterns in customers’ behavior. In many cases, tracking is performed without users’ consent and remains transparent to users. Implicit data could be collected on the client or on the server side. Server-side data include automati- cally generated data in server access logs, referrer logs, agent logs, etc. Client-side data could include cookies, mouse or keyboard tracking, etc. Other sources of customer data are transaction databases, pre-sale and after- sale support data, and demographic information. Such data could be dynamically collected by a Web site or purchased from third parties. In many cases data are stored in different formats in multiple, disparate databases. Implicit profiling removes the burden associated with providing personal information from the user. Instead of relying on the user’s input, the system tries to collect rel- evant data and infer user-specific information. Although less intrusive, implicit profiling may raise several privacy concerns. User profiles and their components can be further clas- sified into static and dynamic, and individual and aggre- gated (group profiles). A profile is static when it changes seldom or never (for example, demographic information). If customer preferences tend to change over time, dynamic profiles can be used. Such profiles are periodically updated to reflect changes in consumer behavior.

OVERVIEW OF FILTERING TECHNOLOGIES Although necessary, user profile management (creating, updating, and maintaining user profiles) is not sufficient for providing personalized services. Information in user profiles has to be analyzed in order to infer users’ needs and preferences. In this section we will briefly explain the most popular personalization techniques: rule-based filtering, collaborative filtering, and content-based filtering. All these techniques are used to predict customers’ inter- ests and make recommendations.

Rule-Based Filtering Association rule mining looks for items that tend to ap- pear together in a given data set. Items could refer to different things in different contexts. They can be products bought by a customer, Web pages visited by a user, etc. To introduce association rules formally, we need the following notation. LetIdenote the set of all items. A transaction Tis defined as a set of items bought together (T⊆I). The set of all transactions is denoted byD. Then, an association rule is defined as an implication between itemsetsA andB, denoted by

A⇒B,

whereA⊆I,B⊆I, andA∩B=∅. An association rule indicates that the presence of items inAin a transaction implies the presence of items inB. For example, according to the following association rule, visitors who look at Web pagesXandYalso look at Web pageZ:

look(Visitor, X) and look(Visitor, Y) ⇒ look (Visitor, Z).

Rules can associate items or customers. For example, the following rule associates items,

buys(Customer 1 , X) and buys(Customer 1 , Y) ⇒ buys(Customer 1 , Z),

and the next rule associates customers,

buys(Customer 1 , X) and buys(Customer 2 , X) ⇒ buys(Customer 3 , X).

Rule-based filtering is based on the following idea. If the behavioral pattern of a customer matches the left- hand side of a rule, then the right-hand side can be used for recommendation or prediction. Two measures are used to indicate the strength of an association rule:supportandconfidence. The support of the ruleA⇒Bis the fraction of the transactions containing bothAandB, i.e.,|A∪B|/|D|. The confidence of the ruleA⇒Bis the fraction of the transactions containing Awhich also containB, i.e.,|A∪B|/|A|. Because a large number of association rules can be generated from large transaction databases, weak and nonsignificant associa- tions have to be filtered out. To eliminate spurious associ- ations, minimum support and minimum confidence can be used. That is, all rules that do not meet the minimum support and minimum confidence are eliminated. An efficient algorithm for association rule mining is frequent pattern growth (FP-growth). The algorithm uses a divide-and-conquer strategy and compresses the database representing frequent items into a frequent-pattern tree (Han et al., in press).

Collaborative Filtering Collaborative filtering (CF) was one of the earliest recommendation technologies. CF is used to make a recommendation to a user by finding a set of users, called a neighbor- hood, that have tastes similar to those of the target user.

The Internet Encyclopedia (Volume 3)

Get our desktop app

Company

Features

Documentation

Resources