Original Article
A study of aortic dissection screening method based on multiple machine learning models
Abstract
Background: The main purpose of the study was to develop an early screening method for aortic dissection (AD) based on machine learning. Due to the rarity of AD and the complexity of symptoms, many doctors have no clinical experience with it. Many patients are not suspected of having AD, which lead to a high rate of misdiagnosis. Here, we report the preliminary study and feasibility of rapid and accurate screening method of AD with machine learning methods.
Methods: The dataset analyzed was composed by examination data provided by the Xiangya Hospital Central South University of China which include a total of 60,000 samples, including aortic patients and non-aortic ones. Each sample has 76 features which is consist of routine examinations and other easily accessible information. Since the proportion of people who are affected is usually imbalanced compared to non-diseased people, multiple machine learning models were used, include AdaBoost, SmoteBagging, EasyEnsemble and CalibratedAdaMEC. They used different methods such as ensemble learning, undersampling, oversampling, and cost-sensitivity to solve data imbalance problems.
Results: AdaBoost performed poorly with an average recall of 16.1% and a specificity of 99.8%. SmoteBagging achieved a statistically significant better performance for this problem with an average recall of 78.1% and a specificity of 79.2%. EasyEnsemble reached the values of 77.8% and 79.3% for recall and specificity respectively. CalibratedAdaMEC’s recall and specificity are 75.8% and 76%.
Conclusions: It was found that the screening performance of the models evaluated in this paper had a misdiagnosis rate lower than 25% except AdaBoost. The data used in these methods are only routine inspection data. This means that machine learning methods can help us build a fast, cheap, worthwhile and effective early screening approach for AD.
Methods: The dataset analyzed was composed by examination data provided by the Xiangya Hospital Central South University of China which include a total of 60,000 samples, including aortic patients and non-aortic ones. Each sample has 76 features which is consist of routine examinations and other easily accessible information. Since the proportion of people who are affected is usually imbalanced compared to non-diseased people, multiple machine learning models were used, include AdaBoost, SmoteBagging, EasyEnsemble and CalibratedAdaMEC. They used different methods such as ensemble learning, undersampling, oversampling, and cost-sensitivity to solve data imbalance problems.
Results: AdaBoost performed poorly with an average recall of 16.1% and a specificity of 99.8%. SmoteBagging achieved a statistically significant better performance for this problem with an average recall of 78.1% and a specificity of 79.2%. EasyEnsemble reached the values of 77.8% and 79.3% for recall and specificity respectively. CalibratedAdaMEC’s recall and specificity are 75.8% and 76%.
Conclusions: It was found that the screening performance of the models evaluated in this paper had a misdiagnosis rate lower than 25% except AdaBoost. The data used in these methods are only routine inspection data. This means that machine learning methods can help us build a fast, cheap, worthwhile and effective early screening approach for AD.