Some Remarks about the Usage of Asymmetric Correlation Measurements for the Induction of Decision Trees

  • Decision trees are used very successfully for the identification resp. classification task of objects in many domains like marketing (e.g. Decker, Temme (2001)) or medicine. Other procedures to classify objects are for instance the logistic regression, the logit- or probit analysis, the linear or squared discriminant analysis, the nearest neighbour procedure or some kernel density estimators. The common aim of all these classification procedures is to generate classification rules which describe the correlation between some independent exogenous variables resp. attributes and at least one endogenous variable, the so called class membership variable. If there are exclusively metric scaled exogenous attributes the procedures often try to aggregate these attributes in a way that the so built new quantity describes the class membership as good as possible. The accuracy of this identification procedure is often measured by variance based measurements. The regression based procedures use theDecision trees are used very successfully for the identification resp. classification task of objects in many domains like marketing (e.g. Decker, Temme (2001)) or medicine. Other procedures to classify objects are for instance the logistic regression, the logit- or probit analysis, the linear or squared discriminant analysis, the nearest neighbour procedure or some kernel density estimators. The common aim of all these classification procedures is to generate classification rules which describe the correlation between some independent exogenous variables resp. attributes and at least one endogenous variable, the so called class membership variable. If there are exclusively metric scaled exogenous attributes the procedures often try to aggregate these attributes in a way that the so built new quantity describes the class membership as good as possible. The accuracy of this identification procedure is often measured by variance based measurements. The regression based procedures use the least squares approach and serve especially for the classification of binary scaled membership variables. If they are above all nominal scaled exogenous attributes the procedures divide the objects in a way that the so generated partitions are as homogeneous as possible. The homogeneity itself is measured by some deviation measurements like the Entropy measure or by some generalized variance based measurements like the Gini index. Only the CHAID algorithm by Kaas (1980), a special decision tree procedure, uses a correlation measure, the X_2 correlation measurement, to generate some classification rules in order to describe the correlation between the involved attributes and the class membership. Although the proper task of the classification procedures is to identify and explain the correlation between at least one membership variable and in general several exogenous attributes, only one algorithm actually uses a correlation measurement to do that. Furthermore, it is noteworthy that this correlation measurement is symmetric in its nature although the classification task is asymmetric: at least one exogenous attributes should explain at least one endogenous variate and not vice versa. Thus, the possibility to classify objects in the manner of decision trees using asymmetric correlation measures should be analyzed. It will be shown that some well-known decision tree algorithms like ID3, C4.5 or CART can be understood as special versions of a generalized decision tree based on symmetric correlation measurements. But in contrast to these procedures the measure to be proposed offers the chance to do some inferential statistics as well.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Andreas HilbertGND
URN:urn:nbn:de:bvb:384-opus4-2364
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/291
Series (Serial Number):Arbeitspapiere zur Mathematischen Wirtschaftsforschung (180)
Type:Working Paper
Language:English
Publishing Institution:Universität Augsburg
Release Date:2006/07/31
GND-Keyword:Klassifikation; Entscheidungsbaum; Korrelationsanalyse
Institutes:Wirtschaftswissenschaftliche Fakultät
Wirtschaftswissenschaftliche Fakultät / Institut für Statistik und mathematische Wirtschaftstheorie
Dewey Decimal Classification:3 Sozialwissenschaften / 31 Statistiken / 310 Sammlungen allgemeiner Statistiken