Comparing linear discriminant analysis and supervised learning algorithms for binary classification - a method comparison study

  • In psychology, linear discriminant analysis (LDA) is the method of choice for two-group classification tasks based on questionnaire data. In this study, we present a comparison of LDA with several supervised learning algorithms. In particular, we examine to what extent the predictive performance of LDA relies on the multivariate normality assumption. As nonparametric alternatives, the linear support vector machine (SVM), classification and regression tree (CART), random forest (RF), probabilistic neural network (PNN), and the ensemble k conditional nearest neighbor (EkCNN) algorithms are applied. Predictive performance is determined using measures of overall performance, discrimination, and calibration, and is compared in two reference data sets as well as in a simulation study. The reference data are Likert-type data, and comprise 5 and 10 predictor variables, respectively. Simulations are based on the reference data and are done for a balanced and an unbalanced scenario in each case.In psychology, linear discriminant analysis (LDA) is the method of choice for two-group classification tasks based on questionnaire data. In this study, we present a comparison of LDA with several supervised learning algorithms. In particular, we examine to what extent the predictive performance of LDA relies on the multivariate normality assumption. As nonparametric alternatives, the linear support vector machine (SVM), classification and regression tree (CART), random forest (RF), probabilistic neural network (PNN), and the ensemble k conditional nearest neighbor (EkCNN) algorithms are applied. Predictive performance is determined using measures of overall performance, discrimination, and calibration, and is compared in two reference data sets as well as in a simulation study. The reference data are Likert-type data, and comprise 5 and 10 predictor variables, respectively. Simulations are based on the reference data and are done for a balanced and an unbalanced scenario in each case. In order to compare the algorithms' performance, data are simulated from multivariate distributions with differing degrees of nonnormality. Results differ depending on the specific performance measure. The main finding is that LDA is always outperformed by RF in the bimodal data with respect to overall performance. Discriminative ability of the RF algorithm is often higher compared to LDA, but its model calibration is usually worse. Still LDA mostly ranges second in cases it is outperformed by another algorithm, or the differences are only marginal. In consequence, we still recommend LDA for this type of application.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Ricarda GrafGND, Marina Zeldovich, Sarah FriedrichORCiDGND
URN:urn:nbn:de:bvb:384-opus4-1003527
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/100352
ISSN:0323-3847OPAC
ISSN:1521-4036OPAC
Parent Title (English):Biometrical Journal
Publisher:Wiley
Place of publication:Weinheim
Type:Article
Language:English
Year of first Publication:2024
Publishing Institution:Universität Augsburg
Release Date:2022/12/19
Tag:Statistics, Probability and Uncertainty; General Medicine; Statistics and Probability
Volume:66
Issue:1
First Page:2200098
DOI:https://doi.org/10.1002/bimj.202200098
Institutes:Mathematisch-Naturwissenschaftlich-Technische Fakultät
Mathematisch-Naturwissenschaftlich-Technische Fakultät / Institut für Mathematik
Mathematisch-Naturwissenschaftlich-Technische Fakultät / Institut für Mathematik / Lehrstuhl für Mathematical Statistics and Artificial Intelligence in Medicine
Dewey Decimal Classification:5 Naturwissenschaften und Mathematik / 51 Mathematik / 510 Mathematik
Licence (German):CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell (mit Print on Demand)