Asymmetric Strategies for Piece-wise Discriminant Constructive Generators

 

Embed or link this publication

Description

Abstract. This short paper presents a new strategy for the constructive generation of discriminant non-linear functions, whose decision rule consists on piece-wise functions. The approach focuses on the generation of discriminants one at a time, with t

Popular Pages


p. 1

asymmetric strategies for piece-wise discriminant constructive generators orestes g manzanilla-salazar universidad simón bolívar cesma dep procesos y sistemas apartado 89000 sartenejas s/n caracas 1080-a venezuela orestes@cesma.usb.ve abstract this short paper presents a new strategy for the constructive generation of discriminant non-linear functions whose decision rule consists on piece-wise functions the approach focuses on the generation of discriminants one at a time with the goal of maximizing the recognition ability on individuals from only one set while assuring complete classification of the other set on each step the well classified individuals are discarded from the training set this paper is intended to share the findings and promote the research in fields with diverse approaches to the binary discriminant problem keywords discriminant analysis pattern recognition supervised learning binary classifier machine learning 1 introduction the complexity of data available for modern problem solving brings with it the need for tools to find knowledge in it we have now the ability to use discriminant functions capable of recognizing or classifying between data from two sets [18 tools from the different disciplines can be extremely similar and they can also have issues already solved in other disciplines the use of different terminology and different theoretical basis has made it difficult to break the barrier between fields thus we find in fields like artificial neural networks ann and optimization different names for similar concepts like step size and learning rate bringing down this isolation would help researchers from each field to benefit from the knowledge of other field s strengths [18 the concern of our strategy arises when the pattern cannot be discriminated by a single linear function the linearly non-separable problem can be approached minimizing the function s prediction errors either by considering the misclassified data as noise or by the construction of a non-linear discriminant [15 in both cases errors in classifying data from the two sets are taken into account in the same way a

[close]

p. 2

2 orestes g manzanilla-salazar strategy we call in this paper symmetric we define ours as asymmetric because errors on each side of the discriminant are treated in radically different ways one of the motivations of the research which led to the asymmetric strategy is the fact that in most methods the complexity of the structure of the data has to be known to some degree in order to make decisions about the non-linear discriminant specifically anns require the user to make a priori decisions about the pruning and tuning methods structure size and activation functions of the network [7 the svms precision depends on wisely chosen non-linear transformation of the feature space [17 the lack of knowledge about the data or expertise in the decisions that need to be done may cause low performance of these methods the strategy proposed is constructive in a way that minimizes the need to decide about the non-linear transformations or the size or complexity of the non-linear function as well as the use of pruning strategies as a result less user expertise is needed to get good results the rest of the paper is organized as follows next section presents the asymmetrical strategy in section 3 we comment on optimization-based implementations and point out the need for benchmark metrics not only for classifier s precision but for the decision making needed in the implementation processes in section 4 are given final remarks and comments about our future work 2 the asymmetric constructive strategy the basic algorithm is described as follows let p and n be the two sets of data from the training set t to be discriminated every cycle finds a separating hypersurface such that one side contains only data from one type s which is taken as classified q and thus discarded from the future cycles the other side contains data from the two sets that remains r to be classified in future cycles see figure 1 asymmetric constructive algorithm begin set r t cycle while p r Ø and n r Ø step 1 choose either s p r or s n r step 2 find a surface to classify data from s step 3 q {all individuals classified from s step 4 r r q the hypersurface is chosen to minimize only errors from the set to be classified while assuring that all data from the other set is contained in the non-classifying side obviously this side may consist of a mixture of data from the two sets at each cycle the algorithm has to decide between two possible hypersurfaces the most useful criteria are [14 · select the one with the maximum percentage of classification if complete classification is the priority · select the one with the fewer errors in the validating set if generalization ability is the priority.

[close]

p. 3

asymmetric strategies for piece-wise discriminant generators 3 the error minimization to generate the hypersurface should be a positive monotone and if possible convex function of 1 an entropy measure information gain [16 2 the distances from the misclassified individuals to the right side of the hypersurface [6 [9 [10 or even 3 the number of misclassified individuals [1 [4 [14 [12 terms related to the complexity of the hypersurface [5 can be included in the optimization model the minimization model and tools may come from the different machine learning disciplines fig 1 given three individuals collinearly distributed and linearly non-separable 1a the strategy generates first a half-space with only one element of p 1b using a linear hypersurface in a second cycle the remaining data is entirely classified 1c the result is a piece-wise nonlinear function 1d the advantage of this strategy becomes critical in data-sets which lack of enough features to linearly separate the two sets or with multi-modal sets in those cases symmetric strategies may result in hypersurfaces defining regions that do not represent the structure of the data unless wisely chosen non-linear transformations of the feature space are used see figure 2 which is the kind of undesirable decisions for our type of problem fig 2 a symmetrical minimization of errors leads to a linear hypersuface of 0 errors but with no effective classification 2a as the half-spaces have no correspondence to the structure of the data 2b it becomes necessary to add complexity either to the kernel or to the hypersurface 2c in order for the model to resemble the structure of the data 2d piecewise functions see figure 1d have been constructed via anns [8 and classification trees [3 but solutions like 2b have to be worked around in arbitrary ways [11 3 performance of the asymmetric strategy excellent results are reported in optimization approaches by manzanilla-salazar [13 on small ill-posed synthetic data creating two hyperplanes at each cycle each classifying one of the two given sets this implementation occasionally generated

[close]

p. 4

4 orestes g manzanilla-salazar redundant hyperplanes the method s precision was better than the msm and msmt algorithms [11 in all databases manzanilla-salazar and garcía-palomares [14 also report excellent results using novel mixed integer quadratic and linear programming models generating only one hypersurface at a time and including analysis of the effects of maximizing the margin before completing a cycle to augment the generalization ability of the discriminant we found issues using benchmark metrics when comparing the asymmetric strategy s performance with other methods clearly the precision results of the classifiers built with the asymmetric approach may be emulated by properly designed and tunned classification trees and anns for example also runtimes could be argued to be dependant on the language and the coding efficiency the benchmark metrics are clear when the objective is to compare the precision of two discriminant functions but we lack of metrics for the costs of the number and complexity of the decision making needed for the implementation of a method which is the essential advantage of the asymmetric approach this brings the need for a qualitative comparison a simple way to express the qualitative comparison is the following the asymmetric strategy leads to lesser or equal decision making and tunning costs than those of symmetrical strategies while reaching similar precision results 4 remarks and future work though our research stems from the use of operations research tools in data mining [14 [13 the aim of this paper is to present strategies involved in the discriminant building process rather than the mathematical approach the arguments employed are being used in more complex optimization models like general nonlinear models [2 and extensions to multi-category and multiobjective problems we are also performing comparative tests with classical kernels at the present time we recognize the need for the cost of decision making involved in the discriminant function development and expect to continue this line of research constructing a method to quantify this cost minimizing this cost may allow users with less expertise in data mining techniques discover knowledge in data sets acknowledgment the author thanks the support given to him by cesma and procesos y sistemas department at usb references 1 bennett k p bredensteinrer e j a parametric optimization method for machine learning informs j comput 9 311 318 1997 2 bennet k p parrado-hernández e eds special topic on machine learning and optimization journal of machine learning research 7 1265 1281 2006 3 breiman l friedman j olshen r stone c classification and regression trees wadsworth inc pacific grove ca 1984

[close]

p. 5

asymmetric strategies for piece-wise discriminant generators 5 4 chen c mangasarian o l hybrid misclassification minimization adv comp math 5 127 136 1996 5 cristianini n shawe-taylor j an introduction to support vector machines cambridge university press cambridge 2000 6 glover f improved linear programming models for discriminant analysis decision sci 21 771 785 1990 7 hertz j krogh a palmer r g introduction to the theory of neural computation the advanced book program addison-wesley redwood city ca 1991 8 lippmann r p an introduction to computing with neural nets ieee assp magazine 4 22 1987 9 mangasarian o l linear and nonlinear separation of patterns by linear programming oper res 13 444 452 1965 10.mangasarian o.l multi-surface method of pattern separation ieee trans inform theory it-14 801 807 1968 11.mangasarian o l mathematical programming in neural networks computer sciences department technical report 1129 university of wisconsin madison wisconsin 1992 12.mangasarian o l misclassification minimization j global optim 5 309 323 1968 13.manzanilla-salazar o g algoritmo para diseñar y entrenar redes neurales de clasificación con programación matemática master thesis systems engineering universidad simón bolívar 2005 14.manzanilla-salazar o g garcía-palomares u m optimization models to train a binary classifier a multi surface approach in print 2009 15.müller k r mika s rätsch g tsuda k schölpf b an introduction to kernel-based learning algorithms ieee transactions on neural networks vol 12 no 2 pp 181-202 2001 16.quinlan j r c4.5 programs for machine learning morgan kaufmann san mateo ca 1993 17.vapnik v n the nature of statistical learning theory springer-verlag new york 1995 18.wolpert d.h the status of supervised learning science circa 1994-the search for a consensus in wolpert d.h ed the mathematics of generalizationproceedings of the sfi/cnls workshop on formal approaches to supervised learning santa fe institute proceedings vol xx pp 1 10 the advanced book program addison-wesley usa-canada 1995

[close]

Other Publications

Enfoques basados en programación lineal y entera mixta para la clasificación de patrones

Enfoques basados en programación lineal y entera mixta para la clasificación de patrones


Tags: supervised learning, data mining, linear programming, neural networks, multisurface method, support vector machines, pattern classification, machine learning

Comments

no comments yet

YOUBLISHER
About
What Others Say
Sitemap
Impressum

PUBLISHERS
Login
Signup
Tutorials
FAQ
Support

BUSINESS
Overview
Advertising
Support

DEVELOPERS
API

LEGAL
Report a Copyright Violation
Copyright FAQ
Terms of Use
Privacy Policy