The Wiley Finance Series : Handbook of News Analytics in Finance

The approach of the Bayes Classifier is to use a set of pre-classified messages to infer
the category of new messages. It learns from past experience. These classifiers are
extremely efficient especially when the number of categories is small (e.g., in the classi-
fication of email into spam vs. non-spam). Here is a brief mathematical exposition of
Bayes classification.
Say we have hundreds of text messages (these are not instant messages!) that we wish
to classify rapidly into a number of categories. The total number of categories or classes
is denotedC, and each category is denotedci;i¼ 1 :::C. Each text message is denoted
mj;j¼ 1 :::M, whereMis the total number of messages. We denoteMias the total
number of messages per classi, and

PC

i¼ 1 Mi¼M. Words in the messages are denoted
aswand are indexed byk, and the total number of words isT.
Letnðm;wÞnðmj;wkÞbe the total number of times wordwkappears in messagemj.
Notation is kept simple by suppressing subscripts as far as possible—the reader will be
able to infer this from the context. We maintain a count of the number of times each
word appears in every message in the training dataset. This leads naturally to the
variablenðmÞ, the total number of words in messagemincluding duplicates. This is a
simple sum,nðmjÞ¼

PT

k¼ 1 nðmj;wkÞ.
We also keep track of the frequency with which a word appears in a category. Hence,
nðc;wÞis the number of times wordwappears in allm 2 c. This is

nðci;wkÞ¼

X

mj 2 ci

nðmj;wkÞð 2 : 1 Þ

This defines a corresponding probability:ðci;wkÞis the probability with which wordw
appears in all messagesmin classc:

ðc;wÞ¼

X

mj 2 ci

nðmj;wkÞ X

mj 2 ci

X

k

nðmj;wkÞ

¼

nðci;wkÞ nðciÞ

ð 2 : 2 Þ

Every word must have some non-zero probability of occurrence, no matter how small—
i.e.,ðci;wkÞ6¼0, 8 ci;wk. Hence, an adjustment is made to equation (2.2) via Laplace’s
Formula which is

ðci;wkÞ¼ nðci;wkÞþ 1 nðciÞþT

This probabilityðci;wkÞis unbiased and efficient. Ifnðci;wkÞ¼0 andnðciÞ¼ 0 ; 8 k, then
every word is equiprobable (i.e., 1=T). We now have the required variables to compute
the conditional probability of a text messagejin categoryi(i.e.,Pr½mjjci):

Pr½mjjci¼

nðmjÞ fnðmj;wkÞg

YT

k¼ 1

ðci;wkÞnðmj;wkÞ

¼

nðmjÞ! nðmj;w 1 Þ!ðmj;w 2 Þ!:::nðmj;wTÞ!

YT

k¼ 1

ðci;wkÞnðmj;wkÞ

Pr½ciis the proportion of messages in the prior training corpus pre-classified into class
ci.(Warning: Careful computer implementation of the multinomial probability above is
required to avoid rounding error.)

News analytics: Framework, techniques, and metrics 51

The Wiley Finance Series : Handbook of News Analytics in Finance

PC

PT

X

X

X

¼

YT

¼

YT

Get our desktop app

Company

Features

Documentation

Resources

The Wiley Finance Series : Handbook of News Analytics in Finance

PC

PT

X

X

X

¼

YT

¼

YT

Get our desktop app

Company

Features

Documentation

Resources

YT