For a given topic, say foreign exchange news, the scoring procedure has the following
parameters:
.A list of keywords/key phrases and real-valued weights:ðW 1 ;
1 Þ;...;ðWk;
kÞ.
.A rolling window size,‘(typically about 5–10 minutes).
.A calibration rolling window size,L(typically about 90 days).
The keywords list and the last‘minutes of news are used to create a raw score, and
this score is normalized/calibrated using statistics about the news over the lastLdays (as
described below).
3.4.1 Assigning scores to news
The score at a given point in time,t, is assigned as follows: Letðw 1 ;...;wkÞbe the vector
of keyword frequencies in the time interval½t‘;tÞ(i.e.,wiis the number of times word/
phraseWihas appeared in the last‘minutes). The raw score at timetis then defined to
be:
st
X
i
(^) iwi: ð 3 : 4 Þ
In this form, the raw score will tend to be high when news volume is high, and so we
calibrate/normalize the score using the calibration rolling window: We maintain a
record of the scores that have been assigned over the lastLdays, along with the news
volume (measured in words per‘minutes) at the time that score was issued. If we denote
byn½t‘;tÞthe number of words that have been observed in the time interval½t‘;tÞ, then
the normalized score is defined by comparing the raw score to the distribution of scores
in the calibration window that had the same news volumen½t‘;tÞ.
Specifically, the normalized score is equal to the fraction of scores—among scores in
the calibration window that had the same news volume—that are less than the current
score. Formally:
St
(^) ft^0 2½tL;tÞ:n½t (^0) ‘;tÞand st (^0) <stg
(^) ft^0 2½tL;tÞ:n½t (^0) ‘;tÞg
: ð^3 :^5 Þ
Thus, a score ofSt¼ 0 :92 can be interpreted as ‘‘92% of the time, when the news volume
is at the current level, the raw score is less than it currently is.’’
3.4.2 A natural extension to alerts
The scoring procedure described above is very flexible and, in particular, also has a
natural extension to incorporating Thomson Reuters topic codes into the scoring.
Specifically, if instead of counting word frequencies we count the fraction of news alerts
in the last‘minutes that have been tagged with various topic codes, then we can assign
scores in exactly the same way, the only difference being that we measure news volume
by the number of alerts that appear (rather than the number of words that appear).
Formally, we have the following parameters:
.A list of topic codes and real-valued weights:ðW 1 ;
1 Þ;...;ðWk;
kÞ.
.A rolling window size,‘.
.A calibration rolling window size,L.
78 Quantifying news: Alternative metrics