The Internet Encyclopedia (Volume 3)

(coco) #1

P1: c-143Braynov-2


Braynov2 WL040/Bidgoli-Vol III-Ch-05 July 11, 2003 11:43 Char Count= 0


OVERVIEW OFFILTERINGTECHNOLOGIES 53

together. Collective rule validation allows a human expert
to reject or accept a large number of rules at once, thereby
reducing validation effort.

Explicit Versus Implicit Profiling
Data for user profiling can be collected implicitly or ex-
plicitly. Explicit collection usually requires the user’s ac-
tive participation, thereby allowing the user to control the
information in his profile. Explicit profiling can take dif-
ferent forms. The user may fill out a form, take part in a
survey, fill out a questionnaire, submit personal informa-
tion at the time of registration, provide a ranking or rating
of products, etc. This method has the advantage of letting
the customers tell a Web site directly what they need and
how they need it.
Implicit profiling does not require the user’s input and
is usually performed behind the scenes. amazon.com,
for example, keeps track of each customer’s purchas-
ing history and recommends specific purchases. Implicit
profiling usually means tracking and monitoring users’
behavior in order to identify browsing or buying patterns
in customers’ behavior. In many cases, tracking is per-
formed without users’ consent and remains transparent
to users. Implicit data could be collected on the client
or on the server side. Server-side data include automati-
cally generated data in server access logs, referrer logs,
agent logs, etc. Client-side data could include cookies,
mouse or keyboard tracking, etc. Other sources of cus-
tomer data are transaction databases, pre-sale and after-
sale support data, and demographic information. Such
data could be dynamically collected by a Web site or
purchased from third parties. In many cases data are
stored in different formats in multiple, disparate data-
bases.
Implicit profiling removes the burden associated with
providing personal information from the user. Instead of
relying on the user’s input, the system tries to collect rel-
evant data and infer user-specific information. Although
less intrusive, implicit profiling may raise several privacy
concerns.
User profiles and their components can be further clas-
sified into static and dynamic, and individual and aggre-
gated (group profiles). A profile is static when it changes
seldom or never (for example, demographic information).
If customer preferences tend to change over time, dy-
namic profiles can be used. Such profiles are periodically
updated to reflect changes in consumer behavior.

OVERVIEW OF FILTERING
TECHNOLOGIES
Although necessary, user profile management (creating,
updating, and maintaining user profiles) is not sufficient
for providing personalized services. Information in user
profiles has to be analyzed in order to infer users’ needs
and preferences. In this section we will briefly explain the
most popular personalization techniques: rule-based fil-
tering, collaborative filtering, and content-based filtering.
All these techniques are used to predict customers’ inter-
ests and make recommendations.

Rule-Based Filtering
Association rule mining looks for items that tend to ap-
pear together in a given data set. Items could refer to dif-
ferent things in different contexts. They can be products
bought by a customer, Web pages visited by a user, etc. To
introduce association rules formally, we need the follow-
ing notation. LetIdenote the set of all items. A transaction
Tis defined as a set of items bought together (T⊆I). The
set of all transactions is denoted byD. Then, an associa-
tion rule is defined as an implication between itemsetsA
andB, denoted by

A⇒B,

whereA⊆I,B⊆I, andA∩B=∅. An association rule
indicates that the presence of items inAin a transaction
implies the presence of items inB. For example, according
to the following association rule, visitors who look at Web
pagesXandYalso look at Web pageZ:

look(Visitor, X) and look(Visitor, Y)
⇒ look (Visitor, Z).

Rules can associate items or customers. For example, the
following rule associates items,

buys(Customer 1 , X) and buys(Customer 1 , Y)
⇒ buys(Customer 1 , Z),

and the next rule associates customers,

buys(Customer 1 , X) and buys(Customer 2 , X)
⇒ buys(Customer 3 , X).

Rule-based filtering is based on the following idea. If
the behavioral pattern of a customer matches the left-
hand side of a rule, then the right-hand side can be used
for recommendation or prediction.
Two measures are used to indicate the strength of an
association rule:supportandconfidence. The support of
the ruleA⇒Bis the fraction of the transactions contain-
ing bothAandB, i.e.,|A∪B|/|D|. The confidence of the
ruleA⇒Bis the fraction of the transactions containing
Awhich also containB, i.e.,|A∪B|/|A|. Because a large
number of association rules can be generated from large
transaction databases, weak and nonsignificant associa-
tions have to be filtered out. To eliminate spurious associ-
ations, minimum support and minimum confidence can
be used. That is, all rules that do not meet the minimum
support and minimum confidence are eliminated.
An efficient algorithm for association rule mining is fre-
quent pattern growth (FP-growth). The algorithm uses a
divide-and-conquer strategy and compresses the database
representing frequent items into a frequent-pattern tree
(Han et al., in press).

Collaborative Filtering
Collaborative filtering (CF) was one of the earliest recom-
mendation technologies. CF is used to make a recommen-
dation to a user by finding a set of users, called a neighbor-
hood, that have tastes similar to those of the target user.
Free download pdf