Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Multiclass Classification Through Multidimensional Clustering 221


However, the focus of this short literature review is on another common
approach, which consists in evolving a discriminant function. In this case the two
main approaches are (1) range selection methods and (2) binary decomposition
methods. Range selection methods are applicable to GP classifiers that output
numerical values. The method works by declaringc 1 thresholds forc-class
classification problems. To select optimal thresholds, several mechanisms have been
proposed, including static thresholds selection (Tackett 1993 ; Zhang and Ciesielski
1999 ), dynamic thresholds (Zhang and Smart 2004 ;Lietal. 2007 ) and slotted
thresholds (Zhang and Smart 2004 ).
In binary decomposition methods, one classifier is trained to recognize samples
belonging to a particular class and reject all other samples. This results inc
classifiers for ac-class classification problem. A well-known drawback of this
approach is related to the fact that the multiple classifiers may result in conflicts,
whose number usually grows up proportionally to the number of classes. Hence,
this approach produces an increased classification error as the number of classes
gets larger. Binary decomposition methods have been explored in Kishore et al.
( 2000 ), Silva and Tseng ( 2008 ), Lin et al. ( 2008 ). The two approaches for multiclass
classification, constructing a single classification function orcbinary classifiers, are
compared in Teredesai and Govindaraju ( 2004 ), by considering a hand-written digit
recognition problem. As reported in Espejo et al. ( 2010 ), when a single function
is evolved, able to discriminate all the classes, the function directly outputs the
numeric value of the predicted class, since each class is an integer digit. In both
cases, the fitness function is based on classification accuracy.
In Muni et al. ( 2004 ) the authors proposed a GP-based approach to multiclass
classification in which each individual is a multitree structure made ofctrees, where
cis the number of classes. Each of thesectrees (T 1 ;;Tc) encodes a threshold
function for a particular class. The system considers that a data instancexbelonging
to classiis correctly classified ifTi.x/ 0 andTj.x/<0, for allj¤i. The fitness
function is computed as the classification accuracy. A similar system evolving a
multiple-threshold discriminant function is described in Winkler et al. ( 2007 ), where
a fitness function based on the sum of squared errors is employed.
One of the most recent contributions of GP for multiclass classification is found
in Jabeen and Baig ( 2013 ). In this work, the authors propose a two-stage strategy for
multiclass classification problems, which is an improvement of a traditional binary
decomposition method.
Finally, we briefly address two previous works that present a similar goal to our
own, highlighting the main differences to the present contribution.
Lin et al. ( 2007 ) proposed a layered multipopulation approach, where each
layer hasdpopulations, and each population produces a single transformation
k.x/WRp!R, and classification is performed based on a threshold. While each
population is evaluated independently, all of them are combined to generate new
feature vectors of dimensiond, which are given as input to a new layer, and only the
final layer has a single population withdD 1. For multiclass problems an Euclidean
distance classifier was used and results show the method improves the search
efficiency and reduces training time. However, the approach does not improve upon

Free download pdf