Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

statistical tests such as cross-validation. Finally, the bad guys can also use
machine learning. For example, if they could get hold of examples of what your
filter blocks and what it lets through, they could use this as training data to learn
how to evade it.
There are, unfortunately, many other examples of adversarial learning situa-
tions in our world today. Closely related to junk email is search engine spam:
sites that attempt to deceive Internet search engines into placing them pro-
minently in lists of search results. Highly ranked pages yield direct financial
benefits to their owners because they present opportunities for advertising, pro-
viding strong motivation for profit seekers. Then there are the computer virus
wars, in which designers of viruses and virus-protection software react to one
another’s innovations. Here the motivation tends to be general disruption and
denial of service rather than monetary gain.
Computer network security is a continually escalating battle. Protectors
harden networks, operating systems, and applications, and attackers find
vulnerabilities in all three areas. Intrusion detection systems sniff out unusual
patterns of activity that might be caused by a hacker’s reconnaissance activity.
Attackers realize this and try to obfuscate their trails, perhaps by working indi-
rectly or by spreading their activities over a long time—or, conversely, by strik-
ing very quickly. Data mining is being applied to this problem in an attempt to
discover semantic connections among attacker traces in computer network data
that intrusion detection systems miss. This is a large-scale problem: audit logs
used to monitor computer network security can amount to gigabytes a day even
in medium-sized organizations.
Many automated threat detection systems are based on matching current data
to known attack types. The U.S. Federal Aviation Administration developed the
Computer Assisted Passenger Pre-Screening System (CAPPS), which screens
airline passengers on the basis of their flight records and flags individuals for
additional checked baggage screening. Although the exact details are unpub-
lished, CAPPS is, for example, thought to assign higher threat scores to cash
payments. However, this approach can only spot known or anticipated threats.
Researchers are using unsupervised approaches such as anomaly and outlier
detection in an attempt to detect suspicious activity. As well as flagging poten-
tial threats, anomaly detection systems can be applied to the detection of illegal
activities such as financial fraud and money laundering.
Data mining is being used today to sift through huge volumes of data in the
name of homeland defense. Heterogeneous information such as financial trans-
actions, health-care records, and network traffic is being mined to create pro-
files, construct social network models, and detect terrorist communications.
This activity raises serious privacy concerns and has resulted in the devel-
opment of privacy-preserving data mining techniques. These algorithms try
to discern patterns in the data without accessing the original data directly,


8.4 ADVERSARIAL SITUATIONS 357

Free download pdf