TY - CONF
T1 - To improve classification accuracy of Alzheimer’s Disease diagnosis by handling missing data
AU - Liu, Shuo
AU - Ye, Qinyong
AU - Coyle, Damien
AU - Cao, Yi
AU - Yang, Hongqin
AU - Ding, Xuemei
N1 - TMED 10 ; Conference date: 11-09-2019
PY - 2019/9/3
Y1 - 2019/9/3
N2 - BackgroundIncreasing senor people suffer from Alzheimer’s disease (AD), and it is significant to make accurate and early diagnosis for treatment and improvement of life quality. Many computer-aided systems have been widely used to classify dementia. While clinical data usually contain missing data, which may have an adverse effect on classification accuracy. In order to obtain more convincing classifiers, this work handled missing data using four computational algorithms and evaluated if the classification accuracy is improved with imputed data.Materials & MethodsThe initial dataset was collected from a local hospital, containing 185 healthy control samples and 187 AD samples. We randomly replaced 20%, 40% and 60% of complete data with missing value from each feature. The mode substitution, K-nearest neighbours (KNN), multiple imputation (MI) and random forest (RF) algorithms were used to impute missing data. The RF classification approach and J48 algorithm were applied on original complete data and the complete data with imputing value.ResultsThe RF algorithm performed best for imputing missing value. The R squared declines with the increasing of the number of missing value. With 40% missing values decreasing to 20% in each feature, the R squared goes from 0.897 down to 0.892. Additionally, the overall classification accuracy was improved with imputed data. The performance of RF classification method, with 89.52% for classification accuracy, 0.954 for AUC and 0.790 for Kappa value, is better than that of J48 (88.63%, 0.944, and 0.718 respectively) for AD classification.ConclusionImputing missing data properly can offer more samples to refine the classifiers with respect to AD diagnosis, which in turn improves the classification accuracy. Future extensions will work on more imputation methods and classifiers.
AB - BackgroundIncreasing senor people suffer from Alzheimer’s disease (AD), and it is significant to make accurate and early diagnosis for treatment and improvement of life quality. Many computer-aided systems have been widely used to classify dementia. While clinical data usually contain missing data, which may have an adverse effect on classification accuracy. In order to obtain more convincing classifiers, this work handled missing data using four computational algorithms and evaluated if the classification accuracy is improved with imputed data.Materials & MethodsThe initial dataset was collected from a local hospital, containing 185 healthy control samples and 187 AD samples. We randomly replaced 20%, 40% and 60% of complete data with missing value from each feature. The mode substitution, K-nearest neighbours (KNN), multiple imputation (MI) and random forest (RF) algorithms were used to impute missing data. The RF classification approach and J48 algorithm were applied on original complete data and the complete data with imputing value.ResultsThe RF algorithm performed best for imputing missing value. The R squared declines with the increasing of the number of missing value. With 40% missing values decreasing to 20% in each feature, the R squared goes from 0.897 down to 0.892. Additionally, the overall classification accuracy was improved with imputed data. The performance of RF classification method, with 89.52% for classification accuracy, 0.954 for AUC and 0.790 for Kappa value, is better than that of J48 (88.63%, 0.944, and 0.718 respectively) for AD classification.ConclusionImputing missing data properly can offer more samples to refine the classifiers with respect to AD diagnosis, which in turn improves the classification accuracy. Future extensions will work on more imputation methods and classifiers.
M3 - Paper
ER -