Predicting malignancy of pulmonary ground-glass nodules and their invasiveness by random forest
Introduction
Lung cancer is the leading cause of cancer-related deaths worldwide (1). The U.S. national lung screening trial (NLST) demonstrated that a 20% reduction in lung cancer mortality for low-dose computed tomography (LDCT) compared to chest radiography (2). However in the NLST trial, 96.4% of participants with pulmonary nodules were benign. The risk of major complications for invasive diagnostic procedures was 4.5 per 10,000 persons screened while 25% of surgically resected nodules were benign (2). An accurate and practical non-invasive tool that can predict the malignancy of lung nodules will thus greatly reduce costs and surgical complications and avoid unnecessary invasive diagnostic procedures or surgery.
With the wide application of CT screening, the detection of ground-glass nodules (GGNs) has become increasingly common, especially in pulmonary nodules less than 2 cm (2-5). Nodules can present as a pure GGN having only a ground-glass component and mixed GGN having both ground-glass and solid components on CT scan (6). Manual interpretation of nodule characteristics such as contour, shape, and margin is a widely accepted method for predicting malignancy of lung nodules. However, automated interpretation of the radiological characteristics of GGNs by deep learning techniques may potentially increase efficiency, improve reproducibility, and improve diagnostic accuracy.
In 2011, a new lung adenocarcinoma classification system was proposed by the International Association for the Study of Lung Cancer (IASLC)/American Thoracic Society (ATS)/European Respiratory Society (ERS) (7). According to this new classification, adenocarcinomas were classified as atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA) and invasive adenocarcinoma with lepidic, acinar, papillary, micropapillary or solid growth patterns (7). The presence of invasive components, especially with a solid or micropapillary growth pattern, correlated with poor prognosis in lung cancers (8,9). On CT scans, less aggressive subtypes such as AIS, MIA and lepidic adenocarcinomas frequently present as pure GGN (9). While the presence of aggressive subtypes, such as solid and micropapillary growth patterns, greatly impacts the disease recurrence and overall survival for patients with GGN (10,11). At present, sublobar resection is advocated for AIS/MIA while lobectomy and lymph node dissection is the standard treatment for invasive adenocarcinoma. Consequently, pre-operative planning in patients with pulmonary GGNs could be greatly improved if deep learning technique could differentiate the subtypes of adenocarcinoma based on their radiologic characteristics.
Herein, we focus on creating a machine learning model to quantify radiological imaging features of GGNs and thereby predict their malignancy and invasiveness. We first built a model to predict the benignity or malignancy of pulmonary GGNs. For those nodules identified as malignant, another binary classification model was built to predict the invasiveness of the nodule.
Methods
Data
The authors retrospectively collected CT images of pulmonary GGN from 1,177 patients who underwent either a sublobar resection or lobectomy from 2015 to 2017 at Shanghai Chest Hospital. CT scans were conducted using a 64-detector CT row scanner (Brilliance 64; Philips, Eindhoven, The Netherlands). Thin-section helical CT scans (1.0 mm collimation, 0.4-second gantry rotation time, 120 kVp, 349 mA) were obtained from the lung apices to the level of the mid portion of both kidneys. The image data were reconstructed with a slice thickness of 1.0 mm using soft tissue and lung algorithms. All CT data were interfaced with our picture archiving and communication system (Kingstar Winning, Shanghai, China).
The data description and collection were conducted by three radiologists at Shanghai Chest Hospital. The images were evaluated in the following settings: lung window center—520 HU/lung window width—1,450 HU; mediastinal window center—40 HU/mediastinal window width—350 HU. For each case, the demographic data was documented. The following nodule image features were evaluated by consensus of the three radiologists:
- Lobulation (yes/no);
- Pleural retraction (yes/no);
- Peritumoral vascularity (yes/no);
- Air bronchogram (yes/no);
- Calcification (yes/no);
- Spiculation (yes/no);
- Bubble lucency (yes/no);
- Nodule density (mixed/pure);
- Lung window largest diameter;
- Lung window vertical diameter;
- Mediastinal window largest diameter;
- Mediastinal window vertical diameter;
- Average CT value.
Malignancy and invasiveness predictions
We used random forest along with logistic regression, decision tree, support vector machine and AdaBoosting to predict the malignancy of the pulmonary GGNs and the invasiveness of malignant GGNs. Random forest is a classification method that consists of multiple nodes of decision trees. Random forest can overcome the overfitting issues to the training set generated by decision trees. The forests were comprised by using randomly selected input variables or combinations of variables at each node to grow each tree.
Results
Patient characteristics
The demographics and clinicopathologic characteristics of the cohort are shown in Table 1. Upon histological examination, 115 (9.8%) cases were benign, 474 (40.2%) AAH/AIS/MIA, 146 (12.4%) lepidic predominant lung adenocarcinoma, 345 (29.3%) acinar/papillary adenocarcinoma, 73 (6.2%) micropapillary/solid adenocarcinoma and 24 (2%) invasive mucinous adenocarcinoma (IMA).
Full table
Malignancy predictions
Random forest obtained 95.1% accuracy and 99.1% sensitivity. AdaBoosting and knearest neighbors (KNN) both achieved 99.4% percent sensitivity. Logistic regression had 80.0% specificity. Table 2 shows the detailed results of predictions using different classification algorithms.
Full table
Figure 1 demonstrates the receiver operating characteristic (ROC) curves of the malignancy prediction models. As can be seen in the figure random forest outperformed [area under the curve (AUC) =96%] the other models [AdaBoosting (AUC =95%), logistic regression (AUC =95%), SVM (AUC =93%), decision tree (83%) and K-nearest neighbors (76%)].
Invasiveness predictions
Random forest achieved the best overall accuracy, whereas logistic regression had the highest sensitivity and AdaBoosting had the highest specificity. Table 3 shows a detailed comparison of the performance of each algorithm in predicting the invasiveness of the malignant GGNs.
Full table
Algorithmic performance for invasiveness is shown in the ROC curves in Figure 2. Random forest achieved the highest AUC at 91%. AdaBoosting, logistic regression and decision tree achieved 90%, 89% and 87% respectively (Figure 2).
Discussions
With the wide application of computed tomography screening for lung cancer worldwide, more GGNs are being found. The positive relationship of lesion size and CT features such as contour, shape, and margin to the likelihood of malignancy in solid pulmonary nodules has been clearly demonstrated (12). Historically, these features were thought to be less helpful in differentiating GGNs. However our study demonstrates that application of machine learning to evaluation of the CT characteristics of GGNs can help predict malignancy and invasiveness with high sensitivity and specificity.
Our model for the classification of pulmonary GGNs achieved 99.1% and 95.1% sensitivity and accuracy respectively, which would facilitate identification of malignant nodules in a screening setting. Automated interpretation of radiological image traits for lung nodules has potential benefits such as increasing efficiency, reproducibility and improving prognosis by providing early detection and treatment. For malignancy predictions, the ensemble model random forest achieved the best performance among other classifiers. This is because the bagging strategy of random forest is more generalized and is able to incorporate different types of features. The majority voting method of random forest can effectively reduce misclassifications. In comparison with random forest bagging approach, the boosting strategy of AdaBoost tends to more likely overfit resulting in its lower accuracy and specificity score.
AAH, AIS and MIA refer to pre-invasive lesions in the lung. The disease-free survival rate is 100% in patients with AAH, AIS or MIA lesions following sublobar resection (10). However, lobectomy with systemic lymph node dissection or sampling is a standard surgical procedure for invasive adenocarcinoma (10). Considering that invasiveness of adenocarcinomas is hard to decide accurately by intraoperative frozen section, pre-treatment differentiation of the invasiveness of the nodule is helpful for pre-operative planning. As with malignancy prediction, the robustness of the bagging strategy of random forest worked well with invasiveness prediction resulting in higher accuracy, sensitivity and AUC.
To take most advantage of machine learning algorithms, it is useful to combine the strengths of multiple classifiers and minimize the misclassification errors. Although random forest already benefits from bagging among a lot of decision trees, it is not enough to only consider one silo classifier. In the future, we will try some more advanced ensemble approaches, such as ensemble on various basic models or stacking, to optimize the accuracy in prognosis.
One limitation of this study is the use of radiologic features, instead utilizing the imaging information directly from the CT images. Future research will use the imaging dataset directly to develop deep learning algorithms for the same purpose. This has been shown to be quite successful in thyroid nodules but not yet tried in lung cancers (13,14).
In conclusion, our study is the first attempt to differentiate benign and malignant pulmonary GGNs and to predict invasiveness of malignant nodules using machine-learning models. Further study is warranted so that artificial intelligence could be incorporated into clinical practice to improve management outcomes.
Acknowledgements
Funding: This work is supported by Shanghai Science and Technology Commission Foundation project (No. 14411950800) (W Gao).
Footnote
Conflicts of Interest: The authors have no conflicts of interest to declare.
Ethical Statement: All the data of this study was retrospectively collected, thereby the ethical approval patients’ informed consent was waived by the institutional review board of Shanghai Chest Hospital.
References
- Torre LA, Bray F, Siegel RL, et al. Global cancer statistics, 2012. CA Cancer J Clin 2015;65:87-108. [Crossref] [PubMed]
- National Lung Screening Trial Research Team, Aberle DR, Adams AM, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365:395-409. [Crossref] [PubMed]
- Lee HY, Lee KS. Ground-glass opacity nodules: histopathology, imaging evaluation, and clinical implications. J Thorac Imaging 2011;26:106-18. [Crossref] [PubMed]
- Takahashi M, Shigematsu Y, Ohta M, et al. Tumor invasiveness as defined by the newly proposed IASLC/ATS/ERS classification has prognostic significance for pathologic stage IA lung adenocarcinoma and can be predicted by radiologic parameters. J Thorac Cardiovasc Surg 2014;147:54-9. [Crossref] [PubMed]
- Kovalchik SA, Tammemagi M, Berg CD, et al. Targeting of low-dose CT screening according to the risk of lung-cancer death. N Engl J Med 2013;369:245-54. [Crossref] [PubMed]
- Yankelevitz DF, Yip R, Smith JP, et al. CT Screening for Lung Cancer: Nonsolid Nodules in Baseline and Annual Repeat Rounds. Radiology 2015;277:555-64. [Crossref] [PubMed]
- Travis WD, Brambilla E, Noguchi M, et al. International association for the study of lung cancer/american thoracic society/european respiratory society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol 2011;6:244-85. [Crossref] [PubMed]
- Hung JJ, Yeh YC, Jeng WJ, et al. Predictive Value of the International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society Classification of Lung Adenocarcinoma in Tumor Recurrence and Patient Survival. J Clin Oncol 2014;32:2357-64. [Crossref] [PubMed]
- Miao Y, Zhang J, Zou J, et al. Correlation in histological subtypes with high resolution computed tomography signatures of early stage lung adenocarcinoma. Transl Lung Cancer Res 2017;6:14-22. [Crossref] [PubMed]
- Dembitzer FR, Flores RM, Parides MK, et al. Impact of histologic subtyping on outcome in lobar vs sublobar resections for lung cancer: a pilot study. Chest 2014;146:175-81. [Crossref] [PubMed]
- Hung JJ, Yeh YC, Jeng WJ, et al. Predictive value of the international association for the study of lung cancer/American Thoracic Society/European Respiratory Society classification of lung adenocarcinoma in tumor recurrence and patient survival. J Clin Oncol 2014;32:2357-64. [Crossref] [PubMed]
- Asamura H, Hishida T, Suzuki K, et al. Radiographically determined noninvasive adenocarcinoma of the lung: survival outcomes of Japan Clinical Oncology Group 0201. J Thorac Cardiovasc Surg 2013;146:24-30. [Crossref] [PubMed]
- Mei XY, Dong XM, Trafalis T, et al. Convolutional Auto-Encoders Based Feature Extraction for Thyroid Nodule Ultrasound Image. 17th International Symposium on Mathematical and Computational Biology, November 2017.
- Mei XY, Dong XM, Deyer T, et al. Thyroid Nodule Benignity Predict by Deep Feature Extraction. 2017 IEEE 17th International Conference Bioinformatics and Bioengineering (BIBE), 2017:24145.