Construction of a radiogenomics predictive model for KRAS mutation status in patients with non-small cell lung cancer

Yunfei Li; Jiawei Li; Yiren Wang; Youhua Wang; Delong Huang; Zhongjian Wen; Yiheng Hu; Sheng Lin; Ping Zhou; Haowen Pang

doi:10.21037/jtd-2024-2003

Original Article

Construction of a radiogenomics predictive model for KRAS mutation status in patients with non-small cell lung cancer

Yunfei Li^1#, Jiawei Li^1#, Yiren Wang^2,3# , Youhua Wang⁴, Delong Huang⁵, Zhongjian Wen^2,3, Yiheng Hu^6,7, Sheng Lin^1*, Ping Zhou^7*, Haowen Pang^1*

¹Department of Oncology, The Affiliated Hospital of Southwest Medical University, Luzhou, China; ²School of Nursing, Southwest Medical University, Luzhou, China; ³Wound Healing Basic Research and Clinical Application Key Laboratory of Luzhou, School of Nursing, Southwest Medical University, Luzhou, China; ⁴Gulin County People’s Hospital, Luzhou, China; ⁵School of Clinical Medicine, Southwest Medical University, Luzhou, China; ⁶Department of Medical Imaging, Southwest Medical University, Luzhou, China; ⁷Department of Radiology, The Affiliated Hospital of Southwest Medical University, Luzhou, China

Contributions: (I) Conception and design: H Pang, P Zhou, S Lin, Y Li; (II) Administrative support: H Pang, P Zhou, S Lin; (III) Provision of study materials or patients: Y Li, J Li, Youhua Wang, D Huang; (IV) Collection and assembly of data: Y Li, J Li, Z Wen, Y Hu; (V) Data analysis and interpretation: Y Li, J Li, Yiren Wang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

^*These authors contributed equally to this work.

Correspondence to: Sheng Lin, MD, PhD. Department of Oncology, The Affiliated Hospital of Southwest Medical University, No. 25 Taiping Street, Jiangyang District, Luzhou 646000, China. Email: lslinsheng@163.com; Ping Zhou, PhD. Department of Radiology, The Affiliated Hospital of Southwest Medical University, No. 25 Taiping Street, Jiangyang District, Luzhou 646000, China. Email: zhouping11@swmu.edu.cn; Haowen Pang, PhD. Department of Oncology, The Affiliated Hospital of Southwest Medical University, No. 25 Taiping Street, Jiangyang District, Luzhou 646000, China. Email: haowenpang@foxmail.com.

Background: Non-small cell lung cancer (NSCLC) represents a significant portion of lung cancer cases globally, with kirsten rats arcomaviral oncogene homolog (KRAS) mutations being a critical factor in its pathogenesis. Predicting KRAS mutation status is crucial for guiding targeted therapies and improving patient outcomes. This study aimed to develop and validate a differential evolution optimized artificial neural network (DE-ANN) model that integrates positron emission tomography/computed tomography (PET/CT) radiomics and genomics data for predicting KRAS mutation status in NSCLC patients, showcasing the potential of multi-omics integration in precision oncology.

Methods: The study utilized PET/CT radiomics features and genomics data from public databases using least absolute shrinkage and selection operator (LASSO) regression and support vector machine-recursive feature elimination (SVM-RFE) to identify key predictive features. The DE-ANN model was optimized using differential evolution algorithms and validated internally using Bootstrap resampling to assess its predictive performance.

Results: The DE-ANN model demonstrated superior predictive accuracy with an area under the curve (AUC) of 0.909 [95% confidence interval (CI): 0.882–0.937], outperforming traditional artificial neural network (ANN) models (AUC =0.819, 95% CI: 0.778–0.860). Key features identified included significant radiomics signatures and gene markers, with the model showing enhanced convergence rates and robust internal validation outcomes. The model’s calibration and decision curve analyses further confirmed its clinical applicability and potential for improving personalized treatment strategies in NSCLC.

Conclusions: The DE-ANN model represents a significant advancement in the predictive modeling of KRAS mutation status in NSCLC, leveraging the synergy between radiomics and genomic data. Its high predictive accuracy and methodological robustness highlight the model’s potential as a tool in precision oncology, warranting further external validation and exploration in other cancer types.

Keywords: Non-small cell lung cancer (NSCLC); kirsten rats arcomaviral oncogene homolog mutation (KRAS mutation); radiomics; genomics; precision medicine

Submitted Nov 19, 2024. Accepted for publication Mar 31, 2025. Published online Jun 26, 2025.

doi: 10.21037/jtd-2024-2003

Highlight box

Key findings

• Developed a differential evolution optimized artificial neural network (DE‑ANN) that integrates positron emission tomography/computed tomography (PET/CT) radiomics (5 CT features, 2 PET features) and a 3‑gene signature to predict kirsten rats arcomaviral oncogene homolog (KRAS) mutation status in non-small cell lung cancer (NSCLC). The model achieved an area under the curve (AUC) of 0.909, sensitivity 0.886, specificity 0.815, and accuracy 0.879, outperforming a standard ANN (AUC =0.819).

• Demonstrated promising calibration (Hosmer-Lemeshow P=0.81; C‑index =0.909) and yielded higher net clinical benefit across a decision-curve range of 8–67%.

What is known and what is new?

• KRAS mutations drive NSCLC progression and guide targeted therapy, but current noninvasive PET or CT radiomics models achieve only moderate prediction performance and do not incorporate genomic data.

• This is the first predictive model to fuse PET/CT radiomics with gene expression via a DE‑ANN framework, using bootstrap resampling for robust internal validation and differential evolution to optimize network weights for favorable convergence and performance.

What is the implication, and what should change now?

• The multi‑omics DE‑ANN offers a noninvasive tool to stratify NSCLC patients for KRAS-targeted therapies and may reduce dependence on invasive biopsies. Next steps should include external validation in prospective multicenter cohorts and expansion to additional omics to confirm generalizability and support clinical adoption.

Introduction

Non-small cell lung cancer (NSCLC) accounts for most lung cancer cases and is one of the most common cancers worldwide and a leading cause of cancer-related deaths (1). Owing to the generally asymptomatic early stages of NSCLC, most patients are diagnosed at an advanced stage, leading to a poor prognosis. Therefore, early diagnosis and effective treatment strategies are crucial to improve patient outcomes (2).

Mutations in the kirsten rats arcomaviral oncogene homolog (KRAS) gene play a pivotal role in the development and progression of NSCLC. KRAS is a GTPase located on the cell membrane involved in the transmission of extracellular signals to the nucleus, thereby influencing cell proliferation, differentiation, and death (3). Mutations in KRAS are among the most common oncogenic mutations in NSCLC, particularly in adenocarcinoma subtypes (4). The presence of KRAS mutations is not only closely related to the prognosis of patients but also significantly affects their response to certain targeted therapies and chemotherapy. Despite the widely recognized importance of KRAS mutations, current tools that enable noninvasive detection of KRAS mutations are still limited. Currently, the detection of KRAS mutations relies primarily on molecular analysis of tissue biopsy samples, including polymerase chain reaction and next-generation sequencing (5,6). However, these methods typically require tumor tissue samples, which poses challenges in clinical settings with patients who are inoperable or for whom sufficient biopsy samples cannot be obtained. Further, traditional biopsy methods are invasive, carry risks and complications, and may not fully reflect the genetic landscape of the tumor because of heterogeneity (7). Heterogeneity means that samples obtained from different regions of the tumor or at different times may exhibit different genetic characteristics, affecting the accuracy and effectiveness of the treatment decisions (8). Therefore, there is a need for a non-invasive, efficient, and comprehensive method for detecting KRAS mutations that captures the genetic heterogeneity of tumors.

Radiogenomics is an emerging field that combines radiomics features with genomic data to reveal the relationships between imaging biomarkers, genetic variations, expression patterns, and disease progression (9). Radiogenomics analysis using positron emission tomography/computed tomography (PET/CT) offers a new avenue for exploring the links between the molecular characteristics and imaging manifestations of NSCLCs, potentially revealing unknown biological information, guiding treatment decisions, and predicting patient treatment response and prognosis (10).

This study aimed to construct a predictive model based on PET/CT radiogenomics to accurately predict KRAS mutation status in patients with NSCLC. By integrating and analyzing PET/CT imaging features and KRAS mutation information, we aimed to develop a reliable predictive tool that offers more personalized treatment options and precise disease management, thereby improving clinical outcomes, optimizing treatment strategies, and ultimately enhancing patient survival rates and quality of life. We present this article in accordance with the TRIPOD reporting checklist (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2024-2003/rc).

Methods

Data source

This study used the NSCLC-radiogenomics dataset from The Cancer Imaging Archive (TCIA) public database, which included PET/CT imaging data and region-of-interest segmentation data for 211 patients along with their clinical baseline information (https://www.cancerimagingarchive.net/collection/nsclc-radiogenomics/), and the GSE103584 dataset from the Gene Expression Omnibus public database, which contains RNA sequencing (RNA-Seq) data for 130 patients (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103584) (11,12). After removal of unmatched RNA-seq and PET/CT, removal of data with missing region of interests (ROIs), and data with unknown KRAS mutation status, a total of 99 samples were included in this study [mutant: n=24 (24.2%), wildtype: n=75 (75.8%)]. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Data preprocessing and differentially expressed gene (DEG) screening

The 3D Slicer software (version 4.2.0) was used to process and extract radiomics features from the TCIA image data. To minimize the impact of image acquisition heterogeneity, before extracting radiomics features from the tumor lesions, the images were normalized by pixel size and resampled, returning normalized voxels of 1 mm × 1 mm × 1 mm.

Data from GSE103584 were supplemented for missing values using the impute package, followed by renormalization using the NormalizeBetweenArrays function in the limma package. Probes corresponding to multiple molecules were removed, and when probes corresponding to the same molecule were encountered, only the probe with the highest signal value was retained for annotation. Adjusted P<0.05 and |log2 fold change (FC)| >1.5 were defined as thresholds for screening DEGs (13).

Functional enrichment analysis of DEGs

To further ascertain the potential functions of the DEGs, gene function enrichment analysis was conducted. The Gene Ontology (GO), which is widely used to annotate functional genes, especially molecular functions, biological pathways, and cellular components, was employed. The Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was used to analyze the gene functions and related advanced genomic functional information (14). The ClusterProfiler package in R was used to analyze the Gene Ontology functions and KEGG pathways of DEGs.

Radiomics and gene signature screening

The intraclass correlation coefficient (ICC) was used to assess the reliability of the extracted radiomics features, with features having an ICC ≥0.75 included in further research (15). A U-test was conducted for the preliminary selection of each radiomics and DEG feature to eliminate redundant features. During this process, the P value threshold was set to 0.05 (16). Subsequently, features statistically significant on both sides were selected using the Spearman test (*, P<0.05), and if the correlation coefficient |r| between two features was 0.9–1.0, one was excluded (17). Then, for each feature category, including PET/CT radiomics and DEG features, least absolute shrinkage and selection operator (LASSO) regression and support vector machine-recursive feature elimination (SVM-RFE) algorithms were used for feature selection. Both algorithms underwent 5-fold cross-validation and a grid search to find the optimal hyperparameters. The feature sets obtained through intersection by both algorithms were named the PET-radiomics signature, CT-radiomics signature, and gene signature.

Construction and evaluation of predictive model

In this study, we used a differential evolution (DE) algorithm to optimize artificial neural networks (ANNs) for constructing predictive models. The DE algorithm is a simple and effective global optimization algorithm primarily used for optimization problems with continuous parameters (18). It manipulates the differences between individuals within a population to generate new individuals, and employs selection operations to progressively improve the quality of the population (the set of all candidate solutions or individuals in a given iteration of the algorithm) (19). The primary characteristics of the DE algorithm are its simple structure, ease of implementation, and robust global search capability.

We defined the population sizeNP, where each individual represents the collection of all weights and biases in the ANN. Each individual in the population was randomly initialized to ensure that the weights and biases were within a reasonable range. Then, DE was conducted in three steps:

Mutation: in the first step of DE, a mutation operation is performed for each individual $x_{i, g}$ in the population. The algorithm randomly selects three distinct individuals: $x_{r 1, g}$ , $x_{r 2, g}$ , and $x_{r 3, g}$ . These individuals are used to construct a new mutant vector $x_{i, g + 1}$ , calculated as follows:
$v_{i, g + 1} = x_{r 1, g} + F \cdot (x_{r 2, g} - x_{r 3, g})$ [1]

Here, F is the differential weight that controls the magnitude of the mutation, thereby influencing the exploration range of the new solution.
Crossover: the purpose of the crossover step is to increase the diversity of the population (20). The information from the mutant vector $v_{i, g + 1}$ and the original individual (target vector) $x_{i, g}$ are combined to generate the trial vector $u_{i, g + 1}$ . This combination ensures that the new solution not only inherits existing beneficial characteristics but also acquires new features from the mutation. The crossover process introduces additional diversity into the population, which is crucial to prevent the algorithm from prematurely converging to a local optimum. For each dimension j of vector, the crossover operation is executed according to the following rule:
$u_{j i, g + 1} = {\begin{matrix} \begin{matrix} v_{j i, g + 1} \\ x_{j i, g} \end{matrix} & \begin{matrix} if r a n d [0, 1) \leq C R or j = r a n d_{j} \\ Otherwise \end{matrix} \end{matrix}$ [2]

Here, CR is the crossover probability that controls the likelihood that elements from the mutant vector are retained in the trial vector. This process ensures that the trial vector inherits characteristics from both the target and mutant vectors, thereby enhancing the genetic diversity of the population.
Selection: whether to accept the trial vector generated through a crossover to replace the existing individual is decided (21). This decision is based on comparing the performances of the trial vector $u_{i, g + 1}$ and target vector $x_{i, g}$ under the fitness function. The trial vector replaces the original individual in the next generation only if it demonstrates a better performance. This process ensures that the population gradually evolves towards better solutions while retaining superior characteristics. The algorithm updates the population according to the following rule:
$x_{i, g + 1} = {\begin{matrix} \begin{matrix} u_{i, g + 1} \\ x_{i, g} \end{matrix} & \begin{matrix} if f (u_{i, g + 1}) \leq f (x_{i, g}) \\ Otherwise \end{matrix} \end{matrix}$ [3]

Here, f (·) denotes the fitness function. If the fitness of the trial vector is greater than or equal to that of the target vector, it replaces the target vector in the next generation; otherwise, the target vector is retained.

Through this series of steps, the DE algorithm updates the population with each generation iteration, progressively converging towards the optimal solution to the problem. When this algorithm is applied to optimize neural networks, these steps directly affect the weights and biases of the network, aiding in determining the best network parameter configuration to enhance the performance of the model in a specific task.

When optimizing an ANN with DE, the algorithm aims to minimize the selected loss function. Given the binary prediction nature of this study, we opted for cross-entropy loss as the loss function of the model. The formula for cross-entropy loss is as follows:

$L_{c r o s s - e n t r o p y} (y, \hat{y}) = - \sum_{i = 1}^{C} y_{i} \log ({\hat{y}}_{i})$ [4]

Here, $y$ is the true label vector, $\hat{y}$ is the prediction vector of the model, and $C$ is the category vector.

In this study, we set the maximum number of iterations G_max to 1,000. The algorithm was terminated after reaching the maximum number of iterations. This condition prevents the algorithm from running indefinitely and serves as a method for evaluating its performance. Additionally, the fitness improvement threshold was set to 20, and if there is no significant improvement in the population’s optimal fitness over several consecutive iterations, it was assumed that the algorithm had converged, and the iteration stopped. This condition helps capture the convergence status of the algorithm and avoid excessive iterations. Once the DE algorithm terminated, we selected the individual with the highest fitness from the final population, thereby identifying the optimized neural network parameters (weights and biases) and completing the training of the neural network. For model evaluation, owing to sample size limitations, this study employed the bootstrap resampling method with 500 resamplings for the internal validation of the training dataset. Bootstrap resampling is a powerful statistical tool that estimates the distribution of a statistic by repeatedly sampling (with replacement) the original data, and is particularly useful for small datasets (22).

To comprehensively assess the predictive performance of the differential evolution optimized artificial neural network (DE-ANN) model in binary classification tasks, the area under the curve (AUC), sensitivity, specificity, accuracy, calibration curves, and decision curve analysis were used to thoroughly evaluate the reliability of the model. These measures provide a holistic view of the performance of the model, ensuring not only its accuracy in predictions, but also its reliability and applicability in practical scenarios.

Statistical analysis

This study was conducted on a deep learning workstation with an i7-13700K 3.4 GHz CPU, RTX4060Ti 16-GB GPU, and 64 GB RAM. The machine learning task was performed using R (version 4.2.1), and the deep learning task was performed using Python (version 3.9) on Juypter Notebook (version 6.5.4). The AUC of receiver operator characteristic curve was used to evaluate the model performance. The calibration curves were tested by the Hosmer-Lemeshow goodness-of-fit test and were statistically significant at P<0.05.

Results

Radiomics feature extraction and DEG identification

In this study, 851 CT-radiomics features and 851 PET-radiomics features were extracted, with all extracted features demonstrating an ICC range of 0.78–0.96, indicating good inter-group reproducibility. Through the analysis of genomic data based on the set threshold, 101 DEGs were identified, of which 92 were upregulated and 11 were downregulated (Figure 1, left). Visualization of dimensionality reduction using the uniform manifold approximation and projection (UMAP) algorithm indicated that KRAS wildtype and mutant samples were difficult to distinguish, presenting a significant challenge for predictions (Figure 1, right).

Figure 1 Visualization of differentially expressed genes and UMAP dimensionality reduction. (Left) Volcano plot. (Right) UMAP visualization plot, where group 1 represents the KRAS wildtype and group 2 represents the mutant type. KRAS, kirsten rats arcomaviral oncogene homolog; UMAP, uniform manifold approximation and projection.

GO and KEGG analysis

The enrichment results for the DEGs in biological process, cellular component, molecular function, and KEGG are presented in Figure 2. GO/KEGG annotations are available in Table S1. In terms of biological process, the DEGs were primarily enriched in nucleosome organization, nucleosome assembly, and protein-DNA complex subunit organization. For cellular component, DEGs were predominantly enriched in the nucleosome organization. In terms of molecular function, the DEGs were mainly enriched in protein heterodimerization activity, protein-DNA complexes, and DNA packaging complexes. KEGG analysis revealed significant enrichment in systemic lupus erythematosus, alcoholism, and neutrophil extracellular trap formation.

Figure 2 GO/KEGG analysis of differentially expressed genes. BP, biological process; CC, cellular component; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular function.

Radiomics and gene signature screening based on machine learning

After eliminating redundant features using the U-test, 725 and 749 features remained for the PET- and CT-radiomics features, respectively. Subsequent Spearman analysis revealed 242 and 186 features with a correlation coefficient |r|=0.9–1.0. After removing one feature from each correlated pair, 604 and 656 features were retained for PET and CT radiomics, respectively. These features were then subjected to further selection using LASSO regression and SVM-RFE. In this study, LASSO regression was performed using 5-fold cross-validation. The lambda.min with the smallest mean squared error was selected as the penalty parameter for LASSO regression. The LASSO regression model determined using this parameter identified 11 CT-radiomics features, 6 important genes, and 3 PET-radiomics features. The minimum error rate was determined with SVM-RFE through 5-fold cross-validation to obtain the selected features, identifying 13 CT-radiomics features, 8 important genes, and 7 PET-radiomics features. Finally, by considering the intersection of the features selected by both algorithms, 5 CT-radiomics signatures, 3 gene signatures, and 2 PET-radiomics signatures were ultimately determined (Figure 3). These features were used to construct the predictive model.

Figure 3 Feature screening by LASSO regression and SVM-RFE. (A,D,G) CT, gene, and PET screening by LASSO regression, respectively. (B,E,H) CT, gene, and PET screening by SVM-RFE, respectively. (C,F,I) Intersections of the LASSO regression and SVM-RFE for CT, gene, and PET, respectively. CT, computed tomography; CV, cross validation; LASSO, least absolute shrinkage and selection operator; PET, positron emission tomography; SVM-RFE, support vector machine-recursive feature elimination.

Predictive model construction and evaluation

This study involved three types of input data; hence, the ANN model was set with three hidden layers. Moreover, based on the binary prediction task (mutant and wildtype), the output layer was set to two. A visualization of the neural network model is shown in Figure 4. The proposed model reached the no-improvement-in-error-rate threshold after 217 iterations with an error rate of 0.0983. In addition, the overall convergence rate of the proposed model was higher than that of a standard ANN (Figure 5). The proposed and standard models were internally validated using 500 bootstrap resamplings. The results showed that the proposed model [AUC =0.909; 95% confidence interval (CI): 0.882–0.937] (Figure 6A) had an approximately 9% improvement in AUC over the standard model (AUC =0.819, 95% CI: 0.778–0.860) (Figure 6B). We also compared the specificity, sensitivity, and accuracy of the two models (Table 1). To further validate the predictive performance of the model, calibration curves were used for evaluation, and the Hosmer-Lemeshow goodness of fit and C-index were used to quantitatively assess model calibration. The results showed that the proposed model had a goodness-of-fit test result of P=0.81, indicating no significant difference between the predicted and observed values, and hence, a good model fit. Moreover, the C-index of the proposed model was 0.909 (95% CI: 0.882–0.937), demonstrating a high discriminative ability (Figure 6C). The C-index of the standard model was 0.819 (95% CI: 0.777–0.861), with a goodness-of-fit test result of P<0.05, indicating some difference between the predicted and observed values and the average discriminative ability (Figure 6D). Compared to the standard model, the proposed model exhibited better predictive performance. Finally, decision curves were utilized to assess the net clinical benefits of the model. The results showed that the clinical net benefit of intervention based on the predicted probabilities was higher than that of not intervening (none) or intervening for all (all) in the 8–67% probability range (Figure 6E). The clinical net benefit range for the standard model was between 7–32%, which is lower than that of the model proposed in this study (Figure 6F).

Figure 4 Visualization of differential-evolution optimized artificial neural network. CT, computed tomography; PET, positron emission tomography.

Figure 5 Loss curve of the neural network. (A) Differential evolution artificial neural network. (B) Standard artificial neural network.

Figure 6 Evaluation of model performance. (A,C,E) AUC, calibration curve, and decision analysis curve of the differential evolution artificial neural network, respectively. (B,D,F) AUC, calibration curve, and decision analysis curve of the standard artificial neural network, respectively. ANN, artificial neural network; AUC, area under the curve; CI, confidence interval; DE-ANN, differential evolution optimized artificial neural network; FPR, false positive rate; TPR, true positive rate.

Table 1

Additional model evaluation metrics

Model	Sensitivity	Specificity	Accuracy
ANN	0.717	0.824	0.727
DE-ANN	0.886	0.815	0.879

ANN, artificial neural network; DE-ANN, differential evolution optimized artificial neural network.

Discussion

This study used a DE-ANN model to accurately predict KRAS mutation status in patients with NSCLC. By integrating PET/CT-radiomics features with gene expression data, our predictive model demonstrated superior performance compared to traditional ANN models.

In the feature selection phase, key radiomics features and genetic markers were identified through LASSO regression and SVM-RFE algorithms, providing a promising technique for enhancing the predictive accuracy of the model. Notably, the discovery of these key features may reveal the pathobiological processes driven by KRAS mutations in NSCLC, offering new perspectives for understanding tumor heterogeneity and progression. Among the selected genetic biomarkers, a previous study has reported that MARCKSL1 enhances the resistance of human lung adenocarcinoma cells to docetaxel through HDAC1/4-mediated silencing of microRNA-200b (23). MARCKSL1 has also been found to promote the proliferation, migration, and invasion of lung adenocarcinoma cells by regulating epithelial-mesenchymal transition, suggesting that MARCKSL1 may hold prognostic value and serve as a new therapeutic target for lung adenocarcinoma (24). RBM8A, an RNA-binding protein, influences the survival and mitosis of A549 cells. Research has indicated that RBM8A-deficient cells cannot progress normally into the next cell cycle after the G1/S blockade, leading to increased apoptosis and cell proliferation defects. Additionally, the depletion of RBM8A and Magoh proteins affects each other, suggesting that they could be potential targets for lung cancer treatment (25). LIPH, a secreted protein selectively upregulated in various cancers, is elevated in the serum of both early- and late-stage lung cancer patients (26). Interestingly, high serum levels of LIPH are associated with better postoperative survival rates in patients with early-stage lung cancer, making it a potential molecular biomarker for lung cancer, particularly for adenocarcinoma and bronchioloalveolar carcinoma (27). The three genetic biomarkers identified in this study are significant predictors of KRAS mutations. However, the relationship between these genetic markers and KRAS mutations in NSCLC has not been explored in current research.

Previous studies have focused on predicting KRAS mutation status in NSCLC. Caicedo et al. (28) evaluated the correlation between (¹⁸F) fluorodeoxyglucose (FDG) uptake in PET imaging and KRAS mutation status by calculating the standardized uptake values (SUVs) of the hottest tumor lesions. They found an AUC of 0.740 for the SUV-mean in predicting KRAS mutations, providing an initial reference for radiogenomics studies in predicting the mutation status. Le et al. (29) constructed a radiogenomics model for KRAS mutation patterns using CT radiomics combined with a genetic algorithm-improved machine learning method and discovered that the genetically improved XGBoost predicted KRAS mutation patterns with an AUC of 0.812. Shiri et al. (10) built a KRAS mutation prediction model based on PET/CT radiomics using feature harmonization and achieved an AUC of 0.91–0.94, demonstrating effective prediction capabilities. Additionally, two studies using convolutional neural network algorithms predicted the KRAS mutation status directly from CT images and genetic data, obtaining AUC values of 83.27% and 72.25%, providing new direction to research in this field (30,31). However, there have been no studies on the construction of a KRAS mutation prediction model that combines PET/CT radiomics with genomic data. Our study developed the first predictive model and achieved favorable results. The model proposed in this study outperformed the PET-imaging prediction model proposed by Caicedo et al. (28) and the PET/CT-radiomics prediction model proposed by Le et al. (29), demonstrating the effectiveness of a multi-omics approach. Although the model proposed by Shiri et al. (10) had a slightly higher predictive performance than ours, their study only used a simple 7:3 split validation method on a single-center dataset of 136 samples, which could result in overly optimistic outcomes when validating small samples. An advantage of this study is the use of bootstrap resampling for internal validation of the model, which is statistically more robust and particularly suitable for dealing with small-sample data. Bootstrap resampling creates multiple virtual sample sets by repeatedly sampling (with replacement) the original dataset, which allows the model to be trained and tested on different data subsets (32). This method can comprehensively assess the generalizability of the model and reduce bias from improper data segmentation, particularly in cases with limited data (33).

In recent years, besides the conventionally used invasive biopsy, liquid biopsy technology has gradually emerged. Liquid biopsy offers advantages such as a short turnaround time, minimal invasiveness, and ease of repeated testing (34). The International Association for the Study of Lung Cancer (IASLC) consensus on liquid biopsy for advanced NSCLC states that plasma circulating tumor DNA (ctDNA) liquid biopsy can now be regarded as an effective tool for genotyping and mutation detection in newly diagnosed advanced NSCLC patients, serving as a valuable complement to other detection methods (35). However, liquid biopsy also has its limitations, especially false negatives, where mutations present in the tumor are not detected remain a significant issue in ctDNA detection. This may be due to low levels of plasma ctDNA, insufficient detection sensitivity, or “non-shedding” tumors (i.e., tumors that do not release ctDNA) (36). Radiogenomics technology, by integrating PET/CT imaging with gene expression data, not only enables the detection of KRAS mutations but also reflects the tumor’s morphology, metabolic activity, and internal structural heterogeneity from an imaging perspective (37). These differences are evident not only at the microscopic level but also at the genetic and transcriptional levels. This inherent heterogeneity may manifest as varied morphological features on diagnostic imaging modalities such as CT and PET/CT (38). Radiomics quantifies the heterogeneity within lesions by analyzing imaging phenotypes using higher-order statistical metrics, and this heterogeneity, in turn, can be harnessed to extract genomic and proteomic data from within the lesion (39). The integration of such multimodal data provides a novel perspective for deepening our understanding of the biological behavior of KRAS-mutant NSCLC. A previous study has shown that radiogenomics can help characterize intratumoral heterogeneity, with radiomics features of tumor heterogeneity capable of predicting RNA polymerase transcription activity, and intensity dispersion features predictive of the autodegradation pathways of ubiquitin ligases (40). Additionally, some research has found that radiogenomics biomarkers can differentiate between ALK-positive and ALK-negative NSCLC patients, while also identifying patients with shorter progression-free survival (41). However, to date, no study has explored the prediction of KRAS mutation status based on radiogenomics while simultaneously assessing long-term survival outcomes. Moreover, the noninvasive nature of radiogenomics allows for dynamic tumor monitoring, which is beneficial for promptly capturing treatment responses and disease progression. In summary, liquid biopsy and radiogenomics techniques are highly complementary in detecting KRAS mutations and evaluating tumor behavior. The future integration of these two approaches is expected to provide more comprehensive and precise diagnostic and therapeutic decision support for NSCLC patients.

The DE-ANN model proposed in this study demonstrated a high predictive accuracy in internal validation, with an AUC significantly higher than that of the standard ANN model, emphasizing the effectiveness of DE algorithms in optimizing complex predictive models. Furthermore, calibration and decision curve analyses of the model confirmed its potential for clinical application, particularly in developing personalized treatment strategies and improving patient outcomes. However, there are some limitations in this study. First, although the use of data from public databases enhances the generalizability of the study, it may also introduce heterogeneity caused by different data collection protocols. Second, the study relied on integrating radiomics and gene expression data from public databases without considering other potential biomarkers such as metabolic features and microbiomics. Additionally, due to the limitation of the dataset, this study did not further explore the role of radiogenomics in predicting KRAS mutation-related behaviors such as drug resistance, radiotherapy sensitivity, and long-term prognosis in NSCLC. Finally, although the model performed excellently in the internal validation, it still requires external validation in a broader patient population to ensure generalizability. In future studies, we hope to overcome the limitations of the current work by expanding the scope and depth of the analysis. To address the data heterogeneity inherent in public datasets, we plan to conduct prospective multicenter studies to provide more reliable, externally validated cohorts. Furthermore, in addition to the radiomics and gene expression profiling used in this study, we intend to incorporate a broader range of multi-omics data, including metabolic and microbiome biomarkers. The potential advantage of this expansion is to not only improve the accuracy of KRAS mutation status prediction, but also to explore the radiogenomics correlates of KRAS mutation-related behaviors, such as drug resistance, radiotherapy sensitivity, and long-term prognosis.

Conclusions

The DE-ANN model developed in this study provides a novel methodological framework for predicting KRAS mutation status in patients with NSCLC, demonstrating the potential application of integrating radiomics and genomics data in precision medicine. Future research should focus on further validating the clinical application value of the model, exploring its application to other tumor types, and continuing to optimize algorithms to enhance predictive performance.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2024-2003/rc

Peer Review File: Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2024-2003/prf

Funding: This study was supported by the Sichuan Science and Technology Program (No. 2022YFS0616); The Gulin County People’s Hospital-The Affiliated Hospital of Southwest Medical University Science and Technology Strategic Cooperation Project (No. 2022GLXNYDFY05); The Xuyong County People’s Hospital-Southwest Medical University Science and Technology Strategic Cooperation Project (No. 2024XYXNYD05); Sichuan Medical and Health Care Promotion Institute Project (No. KY2022SJ0377); Radiation Oncology Key Laboratory of Sichuan Province Project (No. 2024ROKF01); the Key-funded Project of the National College Student Innovation and Entrepreneurship Training Program (No. 202310632001); the National College Student Innovation and Entrepreneurship Training Program (No. 2024464); and the Luzhou-Southwest Medical University Transformation and Landing Support Project (No. 2024LZXNYDZ002).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2024-2003/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Ren J, Zhao S, Lai J. Role and mechanism of COL3A1 in regulating the growth, metastasis, and drug sensitivity in cisplatin-resistant non-small cell lung cancer cells. Cancer Biol Ther 2024;25:2328382. [Crossref] [PubMed]
Akers KG, Oskar S, Zhao B, et al. Clinical Outcomes of PD-1/PD-L1 Inhibitors Among Patients With Advanced or Metastatic Non-Small Cell Lung Cancer With BRAF, ERBB2/HER2, MET, or RET Alterations: A Systematic Literature Review. J Immunother 2024;47:128-38. [Crossref] [PubMed]
Zhang L, Chen W, Wei H, et al. Efficacy of immune checkpoint inhibitors in advanced non-small cell lung cancer patients with KRAS mutations: A network meta-analysis. Clin Respir J 2024;18:e13745. [Crossref] [PubMed]
Landre T, Justeau G, Assié JB, et al. Anti-PD-(L)1 for KRAS-mutant advanced non-small-cell lung cancers: a meta-analysis of randomized-controlled trials. Cancer Immunol Immunother 2022;71:719-26. [Crossref] [PubMed]
Mansour Y, Boubaddi M, Odion T, et al. Droplet digital polymerase chain reaction detection of KRAS mutations in pancreatic FNA samples: Technical and practical aspects for routine clinical implementation. Cancer Cytopathol 2024;132:274-84. [Crossref] [PubMed]
Lundy J, Gao H, Berry W, et al. Targeted Transcriptome and KRAS Mutation Analysis Improve the Diagnostic Performance of EUS-FNA Biopsies in Pancreatic Cancer. Clin Cancer Res 2021;27:5900-11. [Crossref] [PubMed]
Liu S, Liu F, Hou X, et al. KRAS Mutation Detection with (2S,4R)-4-[18F]FGln for Noninvasive PDAC Diagnosis. Mol Pharm 2024;21:2034-42.
Vitale I, Shema E, Loi S, et al. Intratumoral heterogeneity in cancer progression and response to immunotherapy. Nat Med 2021;27:212-24. [Crossref] [PubMed]
Prencipe B, Delprete C, Garolla E, et al. An Explainable Radiogenomic Framework to Predict Mutational Status of KRAS and EGFR in Lung Adenocarcinoma Patients. Bioengineering (Basel) 2023;10:747. [Crossref] [PubMed]
Shiri I, Amini M, Nazari M, et al. Impact of feature harmonization on radiogenomics analysis: Prediction of EGFR and KRAS mutations from non-small cell lung cancer PET/CT images. Comput Biol Med 2022;142:105230. [Crossref] [PubMed]
Bakr S, Gevaert O, Echegaray S, et al. Data for NSCLC Radiogenomics (Version 4). The Cancer Imaging Archive. 2017. Available online: https://doi.org/10.7937/K9/TCIA.2017.7hs46erv
Bakr S, Gevaert O, Echegaray S, et al. A radiogenomic dataset of non-small cell lung cancer. Sci Data 2018;5:180202. [Crossref] [PubMed]
Wang Y, He Y, Duan X, et al. Construction of diagnostic and prognostic models based on gene signatures of nasopharyngeal carcinoma by machine learning methods. Transl Cancer Res 2023;12:1254-69. [Crossref] [PubMed]
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30. [Crossref] [PubMed]
Peng Y, Wang Y, Wen Z, et al. Deep learning and machine learning predictive models for neurological function after interventional embolization of intracranial aneurysms. Front Neurol 2024;15:1321923. [Crossref] [PubMed]
Dai H, Lu M, Huang B, et al. Considerable effects of imaging sequences, feature extraction, feature selection, and classifiers on radiomics-based prediction of microvascular invasion in hepatocellular carcinoma using magnetic resonance imaging. Quant Imaging Med Surg 2021;11:1836-53. [Crossref] [PubMed]
Ji X, Zhang J, Shi W, et al. Bi-parametric magnetic resonance imaging based radiomics for the identification of benign and malignant prostate lesions: cross-vendor validation. Phys Eng Sci Med 2021;44:745-54. [Crossref] [PubMed]
Deng W, Shang S, Cai X, et al. An improved differential evolution algorithm and its application in optimization problem. Soft Computing 2021;25:5277-98.
Hassan S, Hemeida AM, Alkhalaf S, et al. Multi-variant differential evolution algorithm for feature selection. Sci Rep 2020;10:17261. [Crossref] [PubMed]
Brindha S, Miruna Joe Amali S. A robust and adaptive fuzzy logic based differential evolution algorithm using population diversity tuning for multi-objective optimization. Engineering Applications of Artificial Intelligence 2021;102:104240.
Bilal Pant M. Differential Evolution: A review of more than two decades of research. Engineering Applications of Artificial Intelligence 2020;90:103479.
Zelikman E, Wu Y, Mu J, et al. STaR: self-taught reasoner bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems 2022;35:15476-88.
Jiang M, Qi F, Zhang K, et al. MARCKSL1-2 reverses docetaxel-resistance of lung adenocarcinoma cells by recruiting SUZ12 to suppress HDAC1 and elevate miR-200b. Mol Cancer 2022;21:150. [Crossref] [PubMed]
Liang W, Gao R, Yang M, et al. MARCKSL1 promotes the proliferation, migration and invasion of lung adenocarcinoma cells. Oncol Lett 2020;19:2272-80. [Crossref] [PubMed]
Ishigaki Y, Nakamura Y, Tatsuno T, et al. Depletion of RNA-binding protein RBM8A (Y14) causes cell cycle deficiency and apoptosis in human cells. Exp Biol Med (Maywood) 2013;238:889-97. [Crossref] [PubMed]
Ishimine H, Zhou R, Sumitomo K, et al. Lipase member H frequently overexpressed in human esophageal adenocarcinomas. Tumour Biol 2016;37:2075-81. [Crossref] [PubMed]
Seki Y, Yoshida Y, Ishimine H, et al. Lipase member H is a novel secreted protein selectively upregulated in human lung adenocarcinomas and bronchioloalveolar carcinomas. Biochem Biophys Res Commun 2014;443:1141-7. [Crossref] [PubMed]
Caicedo C, Garcia-Velloso MJ, Lozano MD, et al. Role of [¹⁸F]FDG PET in prediction of KRAS and EGFR mutation status in patients with advanced non-small-cell lung cancer. Eur J Nucl Med Mol Imaging 2014;41:2058-65. Erratum in: Eur J Nucl Med Mol Imaging 2014;41:2164.
Le NQK, Kha QH, Nguyen VH, et al. Machine Learning-Based Radiomics Signatures for EGFR and KRAS Mutations Prediction in Non-Small-Cell Lung Cancer. Int J Mol Sci 2021;22:9254. [Crossref] [PubMed]
Xue Y, Zhang D, Jia L, et al. Integrating image and gene-data with a semi-supervised attention model for prediction of KRAS gene mutation status in non-small cell lung cancer. PLoS One 2024;19:e0297331. [Crossref] [PubMed]
Dong Y, Hou L, Yang W, et al. Multi-channel multi-task deep learning for predicting EGFR and KRAS mutations of non-small cell lung cancer on CT images. Quant Imaging Med Surg 2021;11:2354-75. [Crossref] [PubMed]
Song L, Minku LL, Yao X. Software Effort Interval Prediction via Bayesian Inference and Synthetic Bootstrap Resampling. ACM Transactions on Software Engineering and Methodology 2019;28:1-46. (TOSEM).
Sahiner B, Chan HP, Hadjiiski L. Classifier performance prediction for computer-aided diagnosis using a limited dataset. Med Phys 2008;35:1559-70. [Crossref] [PubMed]
Lone SN, Nisar S, Masoodi T, et al. Liquid biopsy: a step closer to transform diagnosis, prognosis and future of cancer treatments. Mol Cancer 2022;21:79. [Crossref] [PubMed]
Rolfo C, Mack P, Scagliotti GV, et al. Liquid Biopsy for Advanced NSCLC: A Consensus Statement From the International Association for the Study of Lung Cancer. J Thorac Oncol 2021;16:1647-62. [Crossref] [PubMed]
Pascual J, Attard G, Bidard FC, et al. ESMO recommendations on the use of circulating tumour DNA assays for patients with cancer: a report from the ESMO Precision Medicine Working Group. Ann Oncol 2022;33:750-68. [Crossref] [PubMed]
Guo Y, Li T, Gong B, et al. From Images to Genes: Radiogenomics Based on Artificial Intelligence to Achieve Non-Invasive Precision Medicine in Cancer Patients. Adv Sci (Weinh) 2025;12:e2408069. [Crossref] [PubMed]
Wong CW, Chaudhry A. Radiogenomics of lung cancer. J Thorac Dis 2020;12:5104-9. [Crossref] [PubMed]
Anagnostopoulos AK, Gaitanis A, Gkiozos I, et al. Radiomics/Radiogenomics in Lung Cancer: Basic Principles and Initial Clinical Results. Cancers (Basel) 2022;14:1657. [Crossref] [PubMed]
Grossmann P, Stringfield O, El-Hachem N, et al. Defining the biological basis of radiomic phenotypes in lung cancer. Elife 2017;6:e23421. [Crossref] [PubMed]
Yamamoto S, Korn RL, Oklu R, et al. ALK molecular phenotype in non-small cell lung cancer: CT radiogenomic characterization. Radiology 2014;272:568-76. [Crossref] [PubMed]

Cite this article as: Li Y, Li J, Wang Y, Wang Y, Huang D, Wen Z, Hu Y, Lin S, Zhou P, Pang H. Construction of a radiogenomics predictive model for KRAS mutation status in patients with non-small cell lung cancer. J Thorac Dis 2025;17(6):3749-3761. doi: 10.21037/jtd-2024-2003

Construction of a radiogenomics predictive model for KRAS mutation status in patients with non-small cell lung cancer

Highlight box

Introduction

Methods

Data source

Data preprocessing and differentially expressed gene (DEG) screening

Functional enrichment analysis of DEGs

Radiomics and gene signature screening

Construction and evaluation of predictive model

Statistical analysis

Results

Radiomics feature extraction and DEG identification

GO and KEGG analysis

Radiomics and gene signature screening based on machine learning

Predictive model construction and evaluation

Table 1

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share