Development and validation of a multimodal radiomics-serum biomarker model for diagnosing solid pulmonary nodules via machine learning
Original Article

Development and validation of a multimodal radiomics-serum biomarker model for diagnosing solid pulmonary nodules via machine learning

Kai Wang# ORCID logo, Xu-Feng Deng#, Zhi Zheng, Zi-Qi Huang, Shuang-Qing Liao, Si-Jin Liu, Zhuo-Xin Dai, Ji-Gang Dai ORCID logo, Quan-Xing Liu

Department of Thoracic Surgery, The Second Affiliated Hospital, Army Medical University, Chongqing, China

Contributions: (I) Conception and design: K Wang, XF Deng; (II) Administrative support: JG Dai, QX Liu; (III) Provision of study materials or patients: QX Liu, XF Deng; (IV) Collection and assembly of data: Z Zheng, ZQ Huang, SQ Liao, SJ Liu, ZX Dai; (V) Data analysis and interpretation: K Wang, XF Deng; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

Correspondence to: Ji-Gang Dai, MD, PhD; Quan-Xing Liu, MD, PhD. Department of Thoracic Surgery, The Second Affiliated Hospital, Army Medical University, 83 Xinqiaozheng Street, Chongqing 400037, China. Email: daijigang@tmmu.edu.cn; quanxing9999@tmmu.edu.cn.

Background: Globally, lung cancer is the most frequently diagnosed malignancy, for which solid pulmonary nodules (SPNs) are a common radiographic finding. Given the high false-positive rates of computed tomography (CT) screening, we aimed to develop a multimodal diagnostic model combining CT radiomics features and serum biomarkers via machine learning.

Methods: This retrospective study included patients receiving both preoperative CT screening and serum biomarker testing. All pulmonary nodules (PNs) were divided into training and validation sets randomly at a ratio of 7:3. We developed a multimodal diagnosis model based on the CT radiomics and protein biomarkers of SPNs in the training cohort. The CT radiomics features were derived from the integration of traditional radiomics analysis methods and three-dimensional (3D) deep learning techniques. The accuracy of this multimodal diagnosis model for the prediction of SPNs was verified in the validation set. Model performances were evaluated in terms of the area under the curve (AUC), accuracy, positive predictive value (PPV), negative predictive value (NPV), decision curve analysis (DCA), and calibration curve.

Results: Between February 2016 and December 2020, imaging data of 638 eligible PNs from CT scans of 633 different patients were collected. The multimodal model had satisfactory accuracy in differentiating benign and malignant SPNs in the training set [AUC =0.944; 95% confidence interval (CI): 0.924–0.964]. In the validation set, the multimodal model yielded an AUC of 0.926 (95% CI: 0.889–0.964), an accuracy of 0.885, an NPV of 0.812, and a PPV of 0.927. The multimodal model also significantly outperformed the single-modality diagnostic models, including the traditional radiomics CT model (AUC =0.843; 95% CI: 0.780–0.906), the serum biomarker model (AUC =0.783; 95% CI: 0.718–0.847), and the 3D deep learning model (AUC =0.820; 95% CI: 0.754–0.885) (all P values <0.01).

Conclusions: This study developed a novel multimodal that demonstrated superior performance in classifying SPNs. It may thus enhance the diagnosis of benign and malignant lesions and provide support for clinical decision-making.

Keywords: Multimodal diagnosis; solid pulmonary nodules (SPNs); radiomics; serum biomarkers; machine learning


Submitted Jun 16, 2025. Accepted for publication Sep 17, 2025. Published online Nov 24, 2025.

doi: 10.21037/jtd-2025-1214


Highlight box

Key findings

• The study developed a multimodal model integrating computed tomography (CT) radiomics features [through a combination of traditional features and three-dimensional (3D) deep learning], serum biomarkers, and clinical data for diagnosing solid pulmonary nodules (SPNs). The model achieved high accuracy in both training [area under the curve (AUC) =0.944] and validation (AUC =0.926) sets, significantly outperforming the single-modality models (highest AUC =0.820–0.843). Age was the only clinical factor predictive of malignancy.

What is known and what is new?

• CT radiomics and serum biomarkers [e.g., cytokeratin 19 fragment 21-1 (CYFRA21-1) and carcinoembryonic antigen] can be applied individually to aid in the diagnosis of patients with lung cancer. Meanwhile, deep learning (e.g., DenseNet) has shown promise in medical image analysis.

• This study developed a multimodal diagnosis model integrating CT radiomics features and serum biomarker protein biomarkers to accurately differentiate benign and malignant SPNs. The model, constructed through a combination if traditional radiomics analysis and 3D deep learning techniques, demonstrated significantly superior diagnostic performance over the single-modality models.

What is the implication, and what should change now?

• This multimodal approach can reduce false positives in low-dose CT screening, preventing unnecessary invasive procedures. Clinical nomograms incorporating age, radiomics, biomarkers, and deep learning scores can provide interpretable diagnostic guidance.

• External validation of the proposed model in multicenter studies remains to be conducted, and prospective cohorts are needed to confirm the generalizability of our findings. The model can be integrated into clinical workflows (e.g., low-dose CT screening programs) via interoperable artificial intelligence tools.

• A comparison of the proposed model and traditional diagnostic pathways in terms of cost-effectiveness should also be conducted.


Introduction

Primary bronchogenic carcinoma (commonly known as “lung cancer”), a malignancy with elevated prevalence and mortality, constitutes a major global health burden. Lung cancer was projected to account for greatest portion of deaths in 2022 in males and females, both in China and the United States (1). The majority of patients with early lung cancer have no obvious symptoms and present clinically with symptoms indicative of an advanced stage. At diagnosis, nearly half (48%) of patients with lung cancer have distant metastases, with a markedly low 5-year survival rate of 8%. In contrast, patients with the cancer confined to the primary site have a significantly higher 5-year survival rate of ≥60% (2). The critical role of early detection in reducing lung cancer mortality and increasing survival prospects is well established. Following the National Lung Screening Trial’s report of a 20% mortality reduction with low-dose computed tomography (LDCT), this screening modality has been extensively adopted for identifying lung cancer at earlier, more treatable stages (3). However, the high sensitivity of LDCT screening comes at the cost of low specificity, leading to the misidentification of benign lung nodules as malignant ones. Indeed, although CT imaging can assess malignancy risk assessment through features such as size, lobar location, density, and margin characteristics, this approach nonetheless results in approximately 20% of nodules being incorrectly classified as malignant (false positives) (4). Hence, developing novel, noninvasive diagnostics that combine high sensitivity and specificity for the differentiation of malignant and benign nodules remains a pressing clinical need.

Routine clinical nodule assessment tools, exemplified by the Mayo Clinic and Veterans Affairs (VA) models, rely on the combination of imaging parameters with risk factors (5). However, the value of these assessment tools, particularly in terms of their sensitivity, varies significantly with nodule size and location. Suspicious lesions detected by LDCT often require invasive diagnostic confirmation via bronchoscopy, transthoracic needle aspiration (TTNA), or surgery, all of which are procedures associated with complications such as bleeding, infection, pneumothorax, and even death. Given these risks, serum biomarkers present a compelling, noninvasive alternative, offering a safer, more economical, and logistically simpler approach for both cancer diagnosis and ongoing monitoring. Cytokeratin 19 fragment 21-1 (CYFRA21-1) was first identified as a biomarker for lung cancer in 1993 (6). The value of cancer antigen 125 (CA125) for predicting the risk of recurrence and prognosis in non-small cell lung cancer (NSCLC) was studied and confirmed as early as 1994 (7). A retrospective study indicated that pro-gastrin-releasing peptide precursor (PROGRP) is a more effective diagnostic marker for small-cell lung cancer (SCLC) than is neuron-specific enolase (NSE) (8). However, this does not mean that NSE can be replaced. Another study indicated that the combination of PROGRP and NSE improves diagnostic efficacy as compared to the application of either biomarker alone (9). Moreover, CA125, carcinoembryonic antigen (CEA), NSE, and squamous cell carcinoma (SCC) antigen have been shown to be diagnostic biomarkers of lung cancer when used in combination with CYFRA21-1 and can also be applied for the prognostic prediction of patients with lung cancer (10-12). The development of serum biomarkers as diagnostic markers has reached a relatively mature stage. However, there remain several shortcomings, such as the lack of standardized testing methods and unsatisfactory sensitivity and specificity.

CT radiomics features are being increasingly recognized as a valuable noninvasive diagnostic tool. Artificial intelligence (AI) algorithms can significantly contribute to the diagnosis, automated detection, and segmentation of malignant tumors. Research suggests that CT imaging radiomics in combination with AI algorithms appears capable of distinguishing benign from malignant pulmonary nodules (PNs) (13,14). Deep learning models based on neural networks have been applied to diagnosing various diseases; for example, magnetic resonance imaging has been used in Alzheimer’s disease (15), while histopathological images have been used in breast cancer (16). A deep learning model based on histological images was developed to predict patient gene mutations in patients with lung cancer (17). Research has established tumor heterogeneity as a critical prognostic factor. Quantitative heterogeneity metrics derived from 18F-fluorodeoxyglucose positron emission tomography-CT, such as the coefficient of variation (CoV), have quantifiable prognostic value. Specifically, lower CoV values in primary NSCLC tumors correlate significantly with worse overall survival, suggesting that homogeneous glycolytic phenotypes may reflect aggressive disease biology (18). Beyond first-order features, advanced texture analysis incorporating shape sphericity, gray-level run-length nonuniformity (GLRLM_RLNU), total lesion glycolysis (TLG), and metabolic tumor volume (MTV)—particularly when enhanced by machine learning algorithms such as decision trees—facilitates the accurate prediction of distant metastases in patients with NSCLC [accuracy: 74.4%; area under the curve (AUC): 0.63] (19). Collectively, these findings support the capacity of radiomics to capture subradiological tumor characteristics essential for risk stratification. Radiomics, an evolving discipline within medical imaging, uses quantitative features extracted from images to facilitate the automated categorization of medical scans into established diagnostic groups. Therefore, we hypothesized that combining radiomics of CT imaging with serum biomarkers can further improve the accuracy of diagnosing benign and malignant PNs.

Our study aimed to improve the accuracy in distinguishing between benign and malignant PNs via the development of a diagnostic model that integrates CT-based radiomics [combining traditional radiomics with three-dimensional (3D) deep learning techniques] and serum biomarkers. The workflow of this study is illustrated in Figure 1. We present this article in accordance with the TRIPOD reporting checklist (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1214/rc).

Figure 1 Workflow of the study. AUC, area under the curve; DL-score, deep learning score; ROC, receiver operating characteristic; ViT, vision transformer.

Methods

Study design and participants

In this retrospective study, serum samples and CT scans from the Thoracic Surgery Department of The Second Affiliated Hospital of Army Military Medical University were analyzed. From February 17, 2016, to December 10, 2020, 650 patients with malignant and benign PNs were included, specifically those patients who screened positive for PNs <3 cm in diameter on CT and who subsequently underwent surgical resection. The other inclusion criteria were as follows: (I) PNs characterized as part-solid nodules or solid nodules; (II) surgical tissue specimens with confirmed pathological diagnosis; (III) a PN diameter ≤3 cm; (IV) age ≥18 years; and (V) complete clinical records and clear diagnosis. Meanwhile, the exclusion criteria were as follows: (I) other malignancies occurring within the past 3 years; (II) pulmonary metastatic tumors originating from other systems; and (III) metastasis to mediastinal lymph nodes. Finally, we included 633 patients with solid PNs (SPNs) at initial diagnosis who received surgery resection, comprising 638 SPNs. The CT images of the 638 PNs from 633 patients were randomly divided into a training set and a validation set in a 7:3 ratio. The data collection process is shown in Figure 2.

Figure 2 Workflow of data collection. SPN, solid pulmonary nodule.

CT image acquisition, lesion segmentation, and radiomics feature extraction

Standardized, unenhanced thoracic CT imaging was conducted for the entire cohort with a Definition AS+ (128-slice) multidetector CT scanner (Siemens Healthineers, Erlangen, Germany). The acquisition settings were as follows: a 120 kV tube voltage, a tube current modulated automatically between 35 and 90 mAs, a pitch of 0.9, a field of view of 180 mm × 180 mm, a 512×512 reconstruction matrix, a B30 or I30 reconstruction kernel, and a 1.2 mm slice thickness/increment for reconstructions. Digital Imaging and Communications in Medicine (DICOM)-formatted images were then transferred for radiomic feature extraction.

For nodal segmentation, all primary tumors were manually segmented using 3D Slicer and reviewed by a junior radiologist with more than 5 years of experience in chest CT interpretation. The segmentation was then confirmed by a senior radiologist with 20 years of chest CT experience. For each accurately delineated nodule, quantitative imaging biomarkers were automatically extracted from the tumor region. This process included computational algorithms implemented within PyRadiomics (https://pyradiomics.readthedocs.io/en/latest/), an open-source Python library designed specifically for extracting radiomic features from clinical medical images (20). In addition to the standard PyRadiomics features, we applied Laplacian of Gaussian (LoG) filtering with sigma values of 1, 2, and 3; wavelet filtering with eight decompositions (LLH, LHL, LHH, HLL, HLH, HHL, HHH, and LLL); and local binary pattern in 3D (LBP3D), exponential, square, square root, logarithm, and gradient transformations. The segmentation and feature extraction process is shown in Figure 3.

Figure 3 Workflow of segmentation and feature extraction. GLCM, gray level co-occurrence matrix; GLDM, gray level dependence matrix; GLRLM, gray level run-length matrix; GLSZM, gray level size zone matrix; NGTDM, neighboring gray tone difference matrix; ROI, region of interest.

Serum protein biomarker testing

All patients underwent lung cancer–associated serum protein biomarker testing before surgery at the laboratory of The Second Affiliated Hospital of Army Military Medical University. The Serum levels of eight clinically relevant and accessible protein cancer biomarkers were measured, including CA125, CA15-3, CEA, CYFRA21-1, NSE, PROGRP, SCC, and serum ferritin (SF). Measurements for CA125, CA15-3, CEA, CYFRA21-1, PROGRP, SCC, and SF were performed on the chemiluminescent immunoassay (CLIA) platform (Abbott Laboratories, Chicago, IL, USA) in accordance with the manufacturer’s standard operating procedures (SOPs). Specific reagent kits utilized included those for CA125 II, CA 15-3, and CEA, along with ARCHITECT CYFRA 21-1, ARCHITECT ProGRP, ARCHITECT SCC, and ferritin (Abbott Laboratories). NSE was measured with an electrochemiluminescence assay (ECLIA) kit (Roche Diagnostics GmbH, Rotkreuz, Switzerland) according to the manufacturer’s SOP.

Radiomics feature selection

Intraclass correlation coefficient (ICC)

Feature robustness was assessed to identify characteristics resilient to region of interest (ROI) segmentation variations. Both test-retest and interrater analyses were employed to evaluate the robustness. For the test-retest analysis, 50 randomly selected patients from the discovery dataset underwent duplicate tumor subregion segmentations by a single rater. The interrater analysis included a separate cohort of 35 patients, with two raters independently segmenting the ROIs for each case. ICC values were calculated for features derived from these multiple segmentations. Features exhibiting an ICC ≥0.85, reflecting high consistency across repeated measurements by the same rater and across different raters, were considered stable.

Spearman correlation

For highly repeatable features, pairwise correlations were additionally quantified with the Spearman rank correlation coefficient. To enhance feature set diversity, a recursive elimination strategy was rigorously applied. This process iteratively removed the most redundant feature from any pair exhibiting a correlation coefficient greater than 0.9, ensuring only one was retained for each highly correlated pair.

Least absolute shrinkage and selection operator (LASSO) regression and the radiomics signature

The discovery dataset underwent signature construction via Cox regression incorporating the LASSO. LASSO applies a regularization penalty (λ) to shrink regression coefficients toward zero, assigning coefficients of features deemed nonpredictive a value of zero. Model tuning employed 10-fold cross-validation with the minimum error criterion being used to identify the optimal λ value (Figure 4). Features retaining nonzero coefficients post-LASSO selection were integrated into a multivariate Cox model to formulate the final radiomics signature. All analytical procedures were conducted via the “scikit-learn” package in Python (Python Software Foundation, DE, USA). The feature weight coefficients are shown in Figure 4C.

Figure 4 The process and results of feature selection in traditional radiomics via LASSO regression. (A) Tuning parameter selection via 10-fold cross-validation. (B) Feature coefficient trajectories across regularization strengths (λ). (C) The coefficient values associated with the finally selected nonzero features. LASSO, least absolute shrinkage and selection operator; MSE, mean standard error.

Radiomics model construction

Following feature selection via LASSO regression, seven distinct machine learning algorithms [adaptive boosting (AdaBoost), gradient boosting, K-nearest neighbors (KNN), logistic regression (LR), multilayer perceptron (MLP), naive Bayes, and support vector machine (SVM)] were evaluated to establish the optimal classification model. A 10-fold cross-validation strategy was employed to derive the final radiomics signature. Model selection was based on the AUC, diagnostic accuracy, positive predictive value (PPV), and negative predictive value (NPV). The output of the optimal model was designated as the radiomics signature score (Rad-score).

Serum biomarker model construction

Eight lung cancer-associated serum biomarkers were selected as serum biomarker features for the malignant and benign groups and analyzed for intergroup differences. Feature selection was performed via 10-fold cross-validated LASSO regression. The selected features were then input into the seven aforementioned machine learning algorithms (AdaBoost, gradient boosting, KNN, LR, MLP, naive Bayes, SVM) to develop a serum biomarker score. The optimal model was similarly selected via calculation of the AUC, diagnostic accuracy, PPV, and NPV. The output of the optimal model was designated as the serum biomarker score (Bio-score).

Deep learning model construction

Spatial feature extraction with DenseNet-169

The model included a DenseNet-169 backbone pretrained on ImageNet for spatial feature extraction from volumetric CT inputs (21). Adapted to 3D medical imaging, it included 64×64×48-voxel ROIs centered on pathological regions. The architecture leverages dense connectivity across four blocks (layer configuration: 6, 12, 32, 32), where each layer concatenates feature maps from all preceding layers to maximize gradient flow and feature reuse. Transition layers with 1×1 convolutions (channel compression ratio θ =0.5) and 2×2 average pooling reduce feature redundancy between blocks. All convolutions included 3×3×3 volumetric kernels and are followed by batch normalization and rectified linear unit activation. The initial convolutional layer processes inputs with 64 kernels, progressively expanding to 512 channels in deeper blocks.

Spatiotemporal modeling via vision transformer (ViT)

DenseNet-derived 3D features were restructured into spatiotemporal tokens for processing by a modified ViT (22). In this process, each 64×64 slice is partitioned into 16×16-pixel patches (yielding 16 spatial tokens/frame), while 48 temporal slices are grouped into 24 tokenized units of 2 consecutive frames, generating 384 input tokens. The ViT module projects tokens into 1,024-dimensional embeddings with embedded dropout (0.1), which are processed through 6 transformer encoder layers. Each layer implements multihead self-attention (8 heads) followed by an MLP with 2,048 hidden units, with Gaussian error linear unit activation and a 0.1 dropout. The model’s superior performance has been validated through metrics including AUC-receiver operating characteristic (ROC) curves, diagnostic accuracy, PPV, and NPV. The probabilistic malignancy output generated by this integrated framework was formally designated as the deep learning score (DL-score) for subsequent analyses.

Construction of the nomogram

Using the validation dataset, we developed a radiomics nomogram. This tool graphically illustrates how the radiomics signature enhances prognostic prediction beyond what is achievable with established clinical risk factors alone. This nomogram integrated the CT-derived radiomics signature with serum biomarkers via multivariable LR. The output of this integrated model constituted the multi-score: a composite diagnostic index consisting of the Rad-score (CT radiomics features), DL-score (deep learning features), Bio-score (serum biomarkers), and key clinical risk factors. Model calibration, reflecting the agreement between nomogram-predicted malignancy risk and actual outcomes, was evaluated with calibration curves and the Hosmer-Lemeshow test. Discriminatory performance was quantified via AUC, which was calculated independently for both the training and validation sets. Statistical comparison of AUC values was conducted via the DeLong test. Finally, the clinical applicability of the nomogram was evaluated through decision curve analysis (DCA), in which the net benefit derived from the integrated model across various probability thresholds for the training and validation datasets was determined.

Statistical analysis

Statistical analyses were conducted with Python 3.9.7 and R 4.2.2 (The R Foundation for Statistical Computing) software, with a significance threshold of P<0.05. For comparative analysis between malignant and benign solid lung nodules, continuous variables underwent the Student t test or the Mann-Whitney test as appropriate, and categorical variables were evaluated with the Chi-squared or the Fisher exact test, depending on their distributional properties.

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments and was approved by the Ethics Committee of The Second Affiliated Hospital of Army Military Medical University (project ID: 2024-YD139-01). The requirement for informed consent was waived due to the retrospective nature of the analysis.


Results

Clinical characteristics of study participants

A total of 633 patients positive for nodules on CT were enrolled from the Thoracic Surgery Department of The Second Affiliated Hospital of Army Military Medical University from February 17, 2016, to December 10, 2020. All enrolled patients had undergone serum protein testing and CT screening prior to surgery. Pathological evaluation of PN samples included surgically resected tissue sections in adherence to the 2015 World Health Organization (WHO) Histological Classification of Lung Cancer (23). Five patients had more than one PN, so a total of 638 PN images were included in the analysis. All participants were randomly divided into a training set (n=446) and a test set (n=192) at a ratio of 7:3.

The clinical dataset included information on patient age, gender, body mass index, and pathological diagnosis. The stratification of patient cohorts and their corresponding clinical characteristics are presented in Table 1. Only age was significantly different between patients with benign and malignant diseases in both the training and validation sets.

Table 1

Baseline characteristics of the patient cohort

Characteristic Training cohort (n=446) Test cohort (n=192)
Benign (n=183) Malignant (n=263) P Benign (n=65) Malignant (n=127) P
Patient age (years) 49.99±10.19 58.13±9.43 <0.001 50.62±8.72 56.37±9.40 <0.001
Gender
   Female 75 (40.98) 99 (37.64) 31 (47.69) 64 (50.39)
   Male 108 (59.02) 164 (62.36) 34 (52.31) 63 (49.61)
BMI, kg/m2 23.35±3.05 24.11±2.92 23.89±2.76 23.77±3.08
Pathology
   Adenocarcinoma 219 (83.27) 107 (84.25)
   Squamous cell carcinoma 19 (7.22) 8 (6.30)
   Inflammation 80 (43.72) 30 (46.15)
   Tuberculosis 38 (20.77) 17 (26.15)
   Hamartoma 23 (12.57) 9 (13.85)
   Other 42 (22.95) 25 (9.51) 9 (13.85) 12 (9.45)

Data are presented as number (%) or mean ± SD. BMI, body mass index; SD, standard deviation.

Radiomics feature selection and radiomics model establishment

From the unenhanced CT images, a comprehensive set of 1834 radiomic features was computed. These features were categorized into six distinct groups: (I) first-order statistics, (II) morphological characteristics, (III) gray-level co-occurrence matrix (GLCM) features, (IV) gray-level run-length matrix (GLRLM) features, (V) gray-level size-zone matrix (GLSZM) features, (VI) neighboring gray-tone difference matrix (NGTDM) features, and (VII) gray-level dependence matrix (GLDM) features. Following extraction, predictive models were developed with the selected features. Applying the LASSO algorithm for feature selection yielded 44 features with nonzero coefficients, which were subsequently integrated into the Rad-score calculation. The coefficients and mean standard error (MSE) derived from the 10-fold cross-validation are visualized in Figure 4A,4B. The coefficient magnitudes for the final set of selected nonzero features are displayed in Figure 4C.

We constructed and evaluated seven distinct models in predicting the malignancy of PNs (Table S1). Comparative analysis indicated the SVM model had the best performance, demonstrating superior predictive capability. SVM yielded the best AUC in the training and validation sets, with AUCs of 0.868 [95% confidence interval (CI): 0.831–0.904] and 0.843 (95% CI: 0.775–0.904) for predicting benign and malignant lung nodules, respectively. In the validation set, the model achieved an accuracy of 0.818, an NPV of 0.714, and a PPV of 0.877. The output of the SVM diagnosis model was calculated as the Rad-score of each patient.

Serum biomarker feature selection and diagnostic model establishment

Six variables identified as statistically significant in the LASSO regression analysis were included in the biomarker score. These variables included CYFRA21-1, PROGRP, CA15-3, CEA, CA125, and NSE levels. We constructed and evaluated seven distinct models to identify the optimal classifier for distinguishing benign from malignant lung nodules. The details of these models are provided in Table S2. Among them, the gradient boosting model outperformed all others, obtaining the highest AUC values for benign and malignant nodule prediction in both the training (0.795; 95% CI: 0.750–0.832) and test sets (0.783; 95% CI: 0.715–0.841). In the validation set, the model achieved an accuracy of 0.693, an NPV of 0.526, and a PPV of 0.936. The output of the gradient boosting diagnosis model was calculated as the Bio-score of each patient.

Development of the 3D deep learning model

Based on the DenseNet169 deep learning model, we developed a 3D deep learning model capable of differentiating between benign and malignant PNs, achieving AUCs of 0.801 (95% CI: 0.760–0.843) and 0.820 (95% CI: 0.754–0.885) in the training cohort and validation cohort, respectively. In the validation set, the model achieved an accuracy of 0.750, an NPV of 0.589, and a PPV of 0.907. The output of the DenseNet169 diagnostic model was calculated as the DL-score of each patient.

Integrated diagnostic model combining clinical features, serum biomarkers, radiomics, and DL-score for PN malignancy prediction

Following the evaluation of the predictive performance of each modality alone, we subsequently developed an integrated model incorporating clinical parameters, serum biomarker score, Rad-score, and DL-score to assess their combined efficacy in discriminating malignant from benign PNs. The formula for the integrated model scoring system (multi-score) of the LR-constructed nomogram was as follows: Multi-score =7.5735 × Rad-score + 3.6913 × DL-score + 9.5334 × Bio-score + 0.0372 × age − 14.1298. These components were integrated into a multimodal model that had good discriminative ability, with an AUC of 0.944 (95% CI: 0.924–0.964) in the training set and 0.926 (95% CI: 0.889–0.964) in the validation set. In the validation set, the integrated model achieved an accuracy of 0.885, an NPV of 0.812, and a PPV of 0.927. The output of the nomogram for the diagnostic model was calculated as the multi-score for each patient.

In order to compare the Bio-score, Rad-score, DL-score, and Multi-score, the DeLong test was used. Statistical analysis revealed a highly significant difference in AUC between the Multi-score model and the other three models (P<0.002). Furthermore, the nomogram model demonstrated superior discriminatory performance for distinguishing malignant from benign nodules as compared to the individual models. Corresponding AUC values for both the training and validation cohorts are presented in Figure 5.

Figure 5 ROC in (A) training and (B) validation sets. AUC, area under the curve; DL-score, deep learning score; ROC, receiver operating characteristic.

Nomogram calibration curves in both the training and validation set indicated a strong concordance between model predictions and actual outcomes of PN classification (benign vs. malignant). Hosmer-Lemeshow tests for the nomogram yielded P values of 0.372 in the training set and 0.545 in the validation set, indicating excellent model fit in both datasets. Calibration curves for all four evaluated models are presented in Figures 5,6 for the respective sets.

Figure 6 Calibration curves in the (A) training and (B) validation sets. DL-score, deep learning score.

In this study, we also evaluated each model through DCA. The DCA for the DL-score, Rad-score, Bio-score, and multi-score is presented in Figure 7. For the validation set, the multi-score demonstrated significantly superior therapeutic outcomes for patients (improvement range: 0.07–0.95) as compared to the baseline scenario lacking a predictive model (i.e., uniform treatment or no treatment). Compared with the Bio-score, Rad-score, and DL-score, the multi-score signature demonstrated superior performance. Its preoperative application in distinguishing between benign and malignant PNs provided enhanced predictive accuracy and clinical utility. The clinically applicable nomogram, presented in Figure 8, includes a total score that can be used to estimate the probability of malignancy for PNs.

Figure 7 Decision curves in the (A) training and (B) validation sets. DL-score, deep learning score.
Figure 8 The integrated nomogram incorporating age, Rad-score, DL-score, and Bio-score. DL-score, deep learning score.

Discussion

In this study, we constructed an integrative diagnostic model by combining serum biomarkers and CT image features for the diagnosis of PNs. This integrated diagnostic model had greater diagnostic accuracy than did the serum biomarker model and the CT feature model. To our knowledge, this study is the first of its kind to develop a framework that synergistically combines radiomics patterns from medical imaging with serum biomarkers to establish a multimodal diagnostic protocol for PN characterization. Our findings support the value of clinical-radiological correlation in complementing machine learning-driven pulmonary diagnostics. Our work aligns with the growing consensus that multimodal integration is essential to overcoming the inherent limitations of unimodal diagnostics in oncology (24,25).

In the construction of the serum biomarker diagnostic model, we found that among the 8 commonly used clinical biomarkers, 6 demonstrated relatively high diagnostic efficacy. Notably, an inverse association was observed between NSE level and malignant diagnosis, which can be attributed to NSE’s established biological specificity as a biomarker preferentially expressed in SCLC. This apparent paradox stems from our study’s exclusive enrollment of patients with NSCLC subtype for which NSE provides limited diagnostic relevance. The histopathological divergence between SCLC and NSCLC fundamentally underlies this biomarker’s differential expression patterns. An increasing number of studies are shifting their focus away from these traditional clinical biomarkers and toward leveraging multiomics technologies in order to develop novel biomarkers. For instance, plasma metabolomics and extracellular vesicle proteomics are being examined for their value in early lung cancer screening (26,27). In future research, we aim to integrate a greater diversity of biomarker data to develop other integrated models.

In the construction of a traditional radiomics and machine learning CT image-based diagnostic model, we found that the selected 44 radiomics features were useful in distinguishing between benign and malignant nodules, with an AUC of 0.843. These results are consistent with those of other studies (28,29). However, there are obvious limitations in the diagnosis of PNs based on CT imaging features alone. Research has confirmed that when a patient undergoes two CT scans on the same day, the observed volume discrepancies can reach approximately ±25%. Such significant variations have the potential to destabilize CT feature quantification (28,30). Beyond the simple binary assessment of malignancy, distinguishing specific tumor types is critical to patient management. While subsolid nodules carry an elevated malignancy risk as compared to their solid counterparts, malignant subsolid tumors typically exhibit indolent characteristics, including reduced growth rates and a diminished metastatic propensity (31). However, it is impossible to distinguish these biological features solely via CT image features. In AI diagnostic models based on CT imaging, the diagnostic accuracy of solid nodules is generally lower than that of subsolid nodules (32). It should be noted that most of the PNs in our study were solid nodules.

We found that the performance of our final integrated diagnostic model was superior to that of the CT image and serum biomarker diagnostic models. Interestingly, in the construction of the combined model, we found that the weight of the traditional radiomics image features was significantly larger than that of features from the other two dimensions (i.e., serum biomarkers and 3D deep learning). Traditional radiomics features and serum biomarkers have greater interpretability than do deep learning models, whereas 3D deep learning can effectively incorporate the influence of the perilesional tumor microenvironment. In addition, it has been reported that age is a key risk factor for lung adenocarcinoma (33). Our LR results also indicated age to be significantly associated with malignant PNs, whereas height, weight, and body mass index (BMI) exhibited no significant association. Thus, we added age as an additional variable beyond the three models when constructing the final version of the nomogram. The synergistic integration of three models leveraged their respective strengths, enabling the acquisition of more comprehensive information, through which an optimized diagnostic performance could be obtained.

Our multimodal model may have significant potential in informing the clinical decision-making related to the diagnosis of SPNs by integrating traditional radiomics [PyRadiomics-derived two-dimensional (2D)/3D features], 3D deep learning, and serum biomarkers. The model achieved a validation AUC of 0.926—surpassing that of the single-modality approaches (AUC ≤0.843). This high diagnostic accuracy could substantially reduce false-positive rates in LDCT screening programs, minimizing unnecessary invasive procedures (e.g., TTNA or surgery) and associated complications (pneumothorax and hemorrhage). Compared with previous investigations, this study incorporated a broader spectrum of radiomics features and clinical biomarkers within a larger cohort population, contributing to the observed superior performance (our multi-score AUC =0.926 vs. previous models AUC ≤0.844) (29,34). The interpretable nomogram (Figure 8) further enables clinicians to quantify malignancy probability preoperatively using routinely available data (age + CT + serum tests). Future implementation via interoperable AI tools could optimize resource allocation in lung cancer screening pathways, although multicenter validation and cost-effectiveness studies remain to be conducted.

Although this study demonstrated the value of integrating features from different modalities for enhanced diagnostic performance, several limitations should be acknowledged. First, the sample size was relatively small due to challenges in patient recruitment and the stringent inclusion criteria inherent to retrospective medical research. However, such sample sizes are pragmatically acceptable in early-stage medical AI studies, as methodological innovation often precedes large-scale validation. Future multicenter collaborations should prioritize external validation to address any concerns regarding generalizability.


Conclusions

We constructed and validated a novel integrated model combining CT images and serum biomarkers capable of differentiating malignant from benign PNs. This multimodal approach outperformed models relying solely on either imaging or serum markers. Future large-scale studies could establish it as a robust tool for early-stage lung cancer screening.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1214/rc

Data Sharing Statement: Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1214/dss

Peer Review File: Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1214/prf

Funding: This work was supported by Chongqing Science and Health Joint Medical Science and Technology Innovation and Key Project (2025GGXM001 to Q.X.L.); New Chongqing Youth Innovation Talent Project (grant/award number: CSTB2024NSCQ-QCXMX0031 to Q.X.L.); and Noncommunicable Chronic Diseases-National Science and Technology Major Project (2024ZD0529400 & 2024ZD0529406 to J.G.D.).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1214/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of The Second Affiliated Hospital of Army Military Medical University (Project ID: 2024-YD139-01). Individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Xia C, Dong X, Li H, et al. Cancer statistics in China and United States, 2022: profiles, trends, and determinants. Chin Med J (Engl) 2022;135:584-90. [Crossref] [PubMed]
  2. Leiter A, Veluswamy RR, Wisnivesky JP. The global burden of lung cancer: current status and future trends. Nat Rev Clin Oncol 2023;20:624-39. [Crossref] [PubMed]
  3. Gomi T, Hara H, Watanabe Y, et al. Improved digital chest tomosynthesis image quality by use of a projection-based dual-energy virtual monochromatic convolutional neural network with super resolution. PLoS One 2020;15:e0244745. [Crossref] [PubMed]
  4. Fehlmann T, Kahraman M, Ludwig N, et al. Evaluating the Use of Circulating MicroRNA Profiles for Lung Cancer Detection in Symptomatic Patients. JAMA Oncol 2020;6:714-23. [Crossref] [PubMed]
  5. Nair VS, Sundaram V, Desai M, et al. Accuracy of Models to Identify Lung Nodule Cancer Risk in the National Lung Screening Trial. Am J Respir Crit Care Med 2018;197:1220-3. [Crossref] [PubMed]
  6. Pujol JL, Grenier J, Daurès JP, et al. Serum fragment of cytokeratin subunit 19 measured by CYFRA 21-1 immunoradiometric assay as a marker of lung cancer. Cancer Res 1993;53:61-6.
  7. Diez M, Torres A, Pollán M, et al. Prognostic significance of serum CA 125 antigen assay in patients with non-small cell lung cancer. Cancer 1994;73:1368-76. [Crossref] [PubMed]
  8. Takada M, Kusunoki Y, Masuda N, et al. Pro-gastrin-releasing peptide (31-98) as a tumour marker of small-cell lung cancer: comparative evaluation with neuron-specific enolase. Br J Cancer 1996;73:1227-32. [Crossref] [PubMed]
  9. Shibayama T, Ueoka H, Nishii K, et al. Complementary roles of pro-gastrin-releasing peptide (ProGRP) and neuron specific enolase (NSE) in diagnosis and prognosis of small-cell lung cancer (SCLC). Lung Cancer 2001;32:61-9. [Crossref] [PubMed]
  10. Zhang ZH, Han YW, Liang H, et al. Prognostic value of serum CYFRA21-1 and CEA for non-small-cell lung cancer. Cancer Med 2015;4:1633-8. [Crossref] [PubMed]
  11. Qu T, Zhang J, Xu N, et al. Diagnostic value analysis of combined detection of Trx, CYFRA21-1 and SCCA in lung cancer. Oncol Lett 2019;17:4293-8. [Crossref] [PubMed]
  12. Yuan J, Sun Y, Wang K, et al. Development and validation of reassigned CEA, CYFRA21-1 and NSE-based models for lung cancer diagnosis and prognosis prediction. BMC Cancer 2022;22:686. [Crossref] [PubMed]
  13. Ren Y, Tsai MY, Chen L, et al. A manifold learning regularization approach to enhance 3D CT image-based lung nodule classification. Int J Comput Assist Radiol Surg 2020;15:287-95. [Crossref] [PubMed]
  14. Chen J, Zeng H, Zhang C, et al. Lung cancer diagnosis using deep attention-based multiple instance learning and radiomics. Med Phys 2022;49:3134-43. [Crossref] [PubMed]
  15. Momeni F, Shahbazi-Gahrouei D, Mahmoudi T, et al. Transfer Learning and Neural Network-Based Approach on Structural MRI Data for Prediction and Classification of Alzheimer's Disease. Diagnostics (Basel) 2025;15:360. [Crossref] [PubMed]
  16. Fang W, Tang S, Yan D, et al. Breast cancer pathology image recognition based on convolutional neural network. PLoS One 2025;20:e0311728. [Crossref] [PubMed]
  17. Zhao Y, Xiong S, Ren Q, et al. Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study. Lancet Oncol 2025;26:136-46. [Crossref] [PubMed]
  18. Pellegrino S, Fonti R, Hakkak Moghadam Torbati A, et al. Heterogeneity of Glycolytic Phenotype Determined by (18)F-FDG PET/CT Using Coefficient of Variation in Patients with Advanced Non-Small Cell Lung Cancer. Diagnostics (Basel) 2023;13:2448. [Crossref] [PubMed]
  19. Hakkak Moghadam Torbati A, Pellegrino S, Fonti R, et al. Machine Learning and Texture Analysis of [18F]FDG PET/CT Images for the Prediction of Distant Metastases in Non-Small-Cell Lung Cancer Patients. Biomedicines 2024;12:472.
  20. van Griethuysen JJM, Fedorov A, Parmar C, et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 2017;77:e104-7. [Crossref] [PubMed]
  21. Huang G, Liu Z, van der Maaten L, et al. Densely Connected Convolutional Networks. arXiv:1608.06993v5 [Preprint]. 2016. Available online: https://arxiv.org/abs/1608.06993
  22. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929v2 [Preprint]. 2020. Available online: https://arxiv.org/abs/2010.11929
  23. Travis WD, Brambilla E, Nicholson AG, et al. The 2015 World Health Organization Classification of Lung Tumors: Impact of Genetic, Clinical and Radiologic Advances Since the 2004 Classification. J Thorac Oncol 2015;10:1243-60. [Crossref] [PubMed]
  24. Ye G, Wu G, Li K, et al. Development and Validation of a Deep Learning Radiomics Model to Predict High-Risk Pathologic Pulmonary Nodules Using Preoperative Computed Tomography. Acad Radiol 2024;31:1686-97. [Crossref] [PubMed]
  25. Lin CY, Guo SM, Lien JJ, et al. Combined model integrating deep learning, radiomics, and clinical data to classify lung nodules at chest CT. Radiol Med 2024;129:56-69. [Crossref] [PubMed]
  26. Zhang L, Zheng J, Bux RA, et al. Clinical Validation of Plasma Metabolite Markers for Early Lung Cancer Detection. Int J Mol Sci 2025;26:4519. [Crossref] [PubMed]
  27. Han L, Song Y, Tong L, et al. Extracellular Vesicle Protein Panel Enables Early Lung Cancer Detection in a Large Clinical Cohort. J Extracell Vesicles 2025;14:e70129. [Crossref] [PubMed]
  28. Dhara AK, Mukhopadhyay S, Dutta A, et al. A Combination of Shape and Texture Features for Classification of Pulmonary Nodules in Lung CT Images. J Digit Imaging 2016;29:466-75. [Crossref] [PubMed]
  29. Qu BQ, Wang Y, Pan YP, et al. The scoring system combined with radiomics and imaging features in predicting the malignant potential of incidental indeterminate small (<20 mm) solid pulmonary nodules. BMC Med Imaging 2024;24:234. [Crossref] [PubMed]
  30. Gietema HA, Schaefer-Prokop CM, Mali WP, et al. Pulmonary nodules: Interscan variability of semiautomated volume measurements with multisection CT-- influence of inspiration level, nodule size, and segmentation performance. Radiology 2007;245:888-94. [Crossref] [PubMed]
  31. Ziyad SR, Radha V, Vayyapuri T. Overview of Computer Aided Detection and Computer Aided Diagnosis Systems for Lung Nodule Detection in Computed Tomography. Curr Med Imaging Rev 2020;16:16-26. [Crossref] [PubMed]
  32. Nikanjam M, Kato S, Kurzrock R. Liquid biopsy: current technology and clinical applications. J Hematol Oncol 2022;15:131. [Crossref] [PubMed]
  33. Wu G, Woodruff HC, Sanduleanu S, et al. Preoperative CT-based radiomics combined with intraoperative frozen section is predictive of invasive adenocarcinoma in pulmonary nodules: a multicenter study. Eur Radiol 2020;30:2680-91. [Crossref] [PubMed]
  34. Hou X, Wu M, Chen J, et al. Establishment and verification of a prediction model based on clinical characteristics and computed tomography radiomics parameters for distinguishing benign and malignant pulmonary nodules. J Thorac Dis 2024;16:1984-95. [Crossref] [PubMed]
Cite this article as: Wang K, Deng XF, Zheng Z, Huang ZQ, Liao SQ, Liu SJ, Dai ZX, Dai JG, Liu QX. Development and validation of a multimodal radiomics-serum biomarker model for diagnosing solid pulmonary nodules via machine learning. J Thorac Dis 2025;17(11):9721-9734. doi: 10.21037/jtd-2025-1214

Download Citation