Principles and Practice of Pharmaceutical Medicine

analysts and healthcare professionals with the abil-
ity to assess the validity of the results generated by
a data mining algorithm. Additionally, data mining
is notdata dredging, which isa pejorative termused
to imply the repeated evaluation of a data set,
usually involving multiple comparisons with no
prior defined method, to find some ‘statistically
significant’ event. Given the statistical problems
associated with conducting multiple comparisons,
such a ‘statistically significant’ event may merely
be a random finding that only gets noted due to the
multiple comparisons, or data dredging.

40.2 Methods

Before any data mining algorithms or models are
used on a database, it is important to first make sure
that the data have been collected appropriately and
that they have been organized and checked for
accuracy. Subsequently, there is a choice from
among multiple data mining methods that can be
used. Among these are the Multi-Item Gamma
Poisson Shrinker (MGPS) algorithm, which gen-
erates an Empirical Bayesian Geometric Mean
(EBGM) score, the Proportional Reporting Ratio

(PRR) method and the Bayesian Neural Network approach Du Mouchel (1999); Evanset al. (2001); Bateet al. (1998). Both the MGPS and PRR methods will generate similar drug–event combinations for further investigation when theobserved number of cases with the drug–event combination is greater than 20 or the expected number of cases with the drug–event combination is<1. EBGM is a statistical measure of disproportionality, comparing the observed and expected reporting frequency within a database. The determination of the expected reporting frequency assumes complete inde- pendence of cases associated with either a drug or an event. Thus, in a hypothetical database of 100 cases, if Drug Z represented 20 cases in the database and there were 10 cases of rhabdomyolysis, the expected reporting frequency would be 20/100 (probability of Drug Z)10/100 (probability of rhabdomyolysis) 100 cases (total database size)¼2 expected cases. If the observed number of drug–event cases was 8, then the relative reporting ratio (RR) would be 8/2 (N/E)¼ 4 and the EBGM would be about 4, depending on the amount of ‘shrinkage’ thatoccurs based on the model (see Figure 40.1). The larger the number of adverse event (AE) reports for a particular drug (for a drug that has

N is the observed number of cases with the combination of items.

E is the expected number of cases with the combination. Calculated as:
Observed # cases with DRUG Observed # cases with EVENT
E = ------------------------------------------ x ---------------------------------------- x Total # cases
Total # cases Total # cases

RR Relative reporting ratio (the same as N/E). Observed number of cases with the combination divided by the expected number of cases with the combination. This may be viewed as a sampling estimate of the true value of observed/expected for the particular combination of drug and event. EBGM Empirical Bayesian Geometric Mean. A more stable estimate than RR; the so-called ‘shrinkage’ estimate. EB05 A value such that there is less than a 5% probability that the true value of observed/expected lies below it. EB95 A value such that there is less than a 5% probability that the true value of observed/expected lies above it. 90% CI The interval from EB05 to EB95 may be considered to be the ‘90% confidence interval’.

Figure 40.1 Empirical Bayesian Geometric Mean (EBGM) terms

546 CH40 DATA MINING

Principles and Practice of Pharmaceutical Medicine

Get our desktop app

Company

Features

Documentation

Resources