The classification goal is to compute the most probable classcigiven any messagemj.
Therefore, using the previously computed values ofPr½mjjciandPr½ci, we obtain the
following conditional probability (applying Bayes’ Theorem):
Pr½cijmj¼
Pr½mjjci:Pr½ci
XC
i¼ 1
Pr½mjjci:Pr½ci
ð 2 : 3 Þ
For each message, equation (2.3) delivers posterior probabilities,Pr½cijmj; 8 i(one for
each message category). The category with the highest probability is assigned to the
message.
The Bayes Classifier requires no optimization and is computable in deterministic time.
It is widely used in practice. There are free off-the-shelf programs that provide good
software to run the Bayes Classifier on large datasets. The one that is very widely used in
finance applications is theBowClassifier, developed by Andrew McCallum when he was
at Carnegie-Mellon University. This is a very fast classifier that requires almost no
additional programming by the user. The user only has to set up the training dataset
in a simple directory structure—each text message is a separate file, and the training
corpus requires different subdirectories for the categories of text.Bowoffers various
versions of the Bayes Classifier (see McCallum, 1996). The simple (naive) Bayes Classi-
fier described above is also available inRin thee1071package—the function is called
naiveBayes. Thee1071package is the machine-learning library inR. There are also
several more sophisticated variants of the Bayes Classifier such as k-Means, kNN, etc.
News analytics begin with classification, and the Bayes Classifier is the workhorse of
any news analytics system. Prior to applying the classifier it is important for the user to
exercise judgment in deciding what categories the news messages will be classified into.
These categories might be a simple flat list, or they may even be a hierarchical set (see
Koller and Sahami, 1997).
2.3.4 Support vector machines
A support vector machine or SVM is a classifier technique that is similar to cluster
analysis but is applicable to very-high-dimensional spaces. The idea may be best
described by thinking of every text message as a vector in high-dimension space, where
the number of dimensions might be, for example, the number of words in a dictionary.
Bodies of text in the same category will plot in the same region of the space. Given a
training corpus, the SVM finds hyperplanes in the space that best separate the text of
one category from another.
For the seminal development of this method, see Vapnik and Lerner (1963); Vapnik
and Chervonenkis (1964); Vapnik (1995); and Smola and Scholkopf (1998). I provide a
brief summary of the method based on these works.
Consider a training dataset given by the binary relation
fðx 1 ;y 1 Þ;:::;ðxn;ynÞgXR
The setX2Rd is the input space and setY2Rmis a set of categories. We define a
function
f:x!y
52 Quantifying news: Alternative metrics