Predicting targeted therapy resistance in non-small cell lung cancer using multimodal machine learning
Original Article

Predicting targeted therapy resistance in non-small cell lung cancer using multimodal machine learning

Peiying Hua1, Andrea Olofson2, Faraz Farhadi3, Liesbeth Hondelink4, Gregory Tsongalis5, Konstantin Dragnev6, Dagmar Hoegemann Savellano3, Arief Suriawinata5, Laura Tafe5, Saeed Hassanpour1,7,8 ORCID logo

1Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH, USA; 2Department of Pathology, Ochsner Health System, New Orleans, LA, USA; 3Department of Radiology, Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA; 4Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands; 5Department of Pathology and Laboratory Medicine, Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA; 6Department of Medical Oncology, Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA; 7Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA; 8Department of Computer Science, Dartmouth College, Hanover, NH, USA

Contributions: (I) Conception and design: S Hassanpour; (II) Administrative support: S Hassanpour; (III) Provision of study materials or patients: A Olofson, F Farhadi, L Hondelink, A Suriawinata; (IV) Collection and assembly of data: A Olofson, F Farhadi, L Hondelink, A Suriawinata; (V) Data analysis and interpretation: P Hua; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Saeed Hassanpour, PhD. Departments of Biomedical Data Science, Computer Science, and Epidemiology, Geisel School of Medicine, 1 Medical Center Drive, HB 7261, Lebanon, NH 03756, USA. Email: Saeed.Hassanpour@Dartmouth.edu.

Background: Resistance to tyrosine kinase inhibitors remains a major clinical challenge in the treatment of non-small cell lung cancer (NSCLC) with activating epidermal growth factor receptor (EGFR) mutations. Despite the efficacy of third-generation EGFR inhibitors, no standard tool currently exists to predict resistance using routinely available clinical data. In this study, we aim to develop a multimodal machine learning model to predict resistance in NSCLC patients using readily accessible clinical information.

Methods: We conducted a multi-institutional retrospective study to develop and evaluate a multimodal machine learning model for predicting therapy resistance in late-stage NSCLC patients with EGFR mutations. The study included 42 patients treated with EGFR-targeted therapy from Dartmouth-Hitchcock Medical Center and Ochsner Health System, using data including histology whole-slide images, next-generation sequencing results, and demographic and clinical variables. The modeling framework fused image and non-image data through a three-stage training process and was evaluated using 5-fold nested cross-validation. Model performance was assessed using the concordance index (C-index), Kaplan-Meier survival curves, and log-rank tests. Interpretability analyses were conducted using attention maps, feature importance coefficients, and cellular composition comparisons.

Results: The multimodal model achieved a mean C-index of 0.82 across cross-validation folds, outperforming image-only and non-image models (C-index 0.75 and 0.77, respectively). Stratified analyses across institutions confirmed consistent performance gains with the multimodal approach. Kaplan-Meier analysis revealed that the multimodal model significantly stratified patients into distinct hazard groups (log-rank P=0.04), which unimodal models failed to achieve. Key predictors included RB1 mutation and Hispanic ethnicity. Attention maps highlighted histologic regions with deformed nuclei, and cellular analysis revealed reduced inflammatory cell presence in high-risk patients.

Conclusions: This study presents a robust multimodal machine learning model for predicting therapy resistance in EGFR-mutant NSCLC, leveraging routinely collected clinical data without manual feature engineering. The model demonstrated superior performance over unimodal models and effective hazard stratification, suggesting utility for personalized treatment decisions. These findings underscore the potential of multimodal artificial intelligence (AI) tools to advance precision oncology, particularly in resource-limited settings. Further validation in larger, diverse cohorts is warranted.

Keywords: Non-small cell lung cancer (NSCLC); therapy resistance prediction; multimodal machine learning; whole slide imaging; precision oncology


Submitted May 20, 2025. Accepted for publication Aug 08, 2025. Published online Oct 29, 2025.

doi: 10.21037/jtd-2025-1012


Highlight box

Key findings

• A multimodal machine learning model was developed to predict resistance in late-stage non-small cell lung cancer (NSCLC) patients with EGFR mutations.

• By integrating histology images, genomic alterations, and clinical data, the model achieved a mean concordance index (C-index) of 0.82 and outperformed unimodal models in predictive accuracy and hazard stratification.

• The model used routinely collected clinical data and required no manual feature engineering.

What is known and what is new?

• Resistance to EGFR-targeted therapies remains a major challenge in NSCLC, and no validated tools exist to predict resistance using multimodal clinical data.

• This study introduces a multimodal machine learning framework that combines histopathology, genomics, and clinical data to predict therapeutic resistance and stratify patient risk, showing consistent improvements over unimodal approaches.

What is the implication, and what should change now?

• This model can support personalized treatment planning and improve prognostic accuracy using existing patient data.

• Future studies should validate this approach in larger, diverse cohorts to enable clinical adoption.


Introduction

Lung cancer remains the leading cause of cancer-related deaths worldwide, accounting for approximately 1.6 million deaths annually, or nearly one-quarter of all cancer mortalities (1). Non-small cell lung cancer (NSCLC) constitutes approximately 85% of all lung cancer cases (2), and among these, a significant proportion (approximately 32%) harbor activating mutations in the epidermal growth factor receptor (EGFR) gene (3). Targeted therapies such as osimertinib, a third-generation EGFR tyrosine kinase inhibitor (TKI), have demonstrated improved progression-free and overall survival in patients with EGFR-mutated NSCLC (4,5). Despite its efficacy, many patients inevitably develop resistance to osimertinib, often within 10–19 months of treatment initiation (6,7). For patients whose disease progresses despite treatment with osimertinib, the National Comprehensive Cancer Network (NCCN) guidelines recommend subsequent targeted therapies, such as amivantamab or lazertinib, as potential next-line treatments (8).

Currently, there are no standardized clinical tools available to predict treatment resistance in EGFR-mutant NSCLC using a comprehensive, multimodal approach. While several studies have identified individual prognostic factors—such as patient age, sex, tumor, node, metastasis (TNM) staging, RB1 or TP53 mutations, brain metastases, and circulating tumor DNA (ctDNA) positivity—these markers each capture only a narrow dimension of a patient’s overall health profile (9-15). As a result, their utility in guiding personalized treatment strategies or proactively adjusting care plans remains limited. The lack of predictive tools also contributes to uncertainty in patient counseling and adds financial and emotional burdens, particularly for patients facing the high cost and potential toxicities of targeted therapies like osimertinib (16,17). Meanwhile, multimodal data—including histopathology slides, next-generation sequencing (NGS), and clinical variables—are routinely collected in oncology practice but remain underutilized in predicting therapeutic response. Recent advances in machine learning and artificial intelligence offer new opportunities to harness such multimodal data for predictive modeling in oncology. Integrating disparate data sources through machine learning models has shown promise in improving prognostic accuracy and treatment stratification (18-27). However, few studies have evaluated multimodal models for resistance prediction specifically in the context of EGFR-targeted therapy.

To address this gap, we conducted a multi-institutional retrospective study to develop and evaluate a multimodal machine learning model that integrates histology, genomics, and clinical data to predict resistance in late-stage EGFR-mutant NSCLC patients. Our objective was to create a clinically interpretable, non-invasive tool using routinely collected patient data to support personalized treatment decisions and enhance precision oncology efforts. We present this article in accordance with the TRIPOD reporting checklist (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1012/rc).


Methods

Datasets

This retrospective study analyzed data from 42 patients with stage IIIb or IV NSCLC harboring EGFR-activating mutations, treated with osimertinib at Dartmouth-Hitchcock Medical Center (DHMC, n=23) and Ochsner Health System (n=19). Inclusion criteria required patients to have received osimertinib as first- or later-line therapy and to have undergone NGS confirming EGFR mutations. Patients with carcinoma of unknown primary, non-NSCLC histology, use of osimertinib in adjuvant or neoadjuvant settings, or presence of de novo resistance mutations were excluded. Collected data included demographic variables, prior treatments, pathology reports, baseline histological whole-slide images (WSI) (at or near the time of initial diagnosis), NGS profiles, radiology reports, and clinical outcome of progression-free survival, which was defined as time from start of osimertinib treatment to progression. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Institutional review board (IRB) approval was obtained from both participating institutions: DHMC (STUDY00030450) and Ochsner Health System (2023.263). Informed consent was waived due to the retrospective nature of the study and the use of de-identified clinical data. Patient confidentiality was maintained throughout, and all data were handled in accordance with applicable privacy regulations.

Data pre-processing

NGS data were processed into binary representations of mutation status. Furthermore, certain mutations of potential clinical importance according to the literature were identified. These include the L858R and T790M mutations in the EGFR gene, EGFR exon 19 deletion, EGFR amplification, and disruptive TP53 mutation (12,28). Missing values for smoking status and quantity were imputed using group-wise averages stratified by smoking status. Histopathology slides were scanned at 20× (0.5 µm/pixel) or 40× (0.25 µm/pixel) magnification using Leica AT2 and Philips UltraFast scanners.

Single modality models

Non-image modality

For the non-image modality, we modeled osimertinib resistance using NGS data, demographic information, and other clinical variables. Given the limited number of eligible NSCLC patients treated with osimertinib, the available sample size was small. We selected Cox proportional hazards regression as the primary model due to its interpretability, established utility in survival analysis, and reliability in low-sample contexts (29,30).

To evaluate the effect of model architecture on overall multimodal performance, we conducted an experiment comparing Cox regression with two alternative approaches: self-normalizing neural networks (SNNs) and feed-forward neural networks (FFNNs) (31). SNNs, introduced by Klambauer et al. (32), maintain stable activations through scaled exponential linear units (SELUs) and are well suited for high-dimensional, small-sample scenarios. Both SNN and FFNN models were implemented using compact two-layer architectures with eight neurons per layer and trained with L1/L2 regularization and dropout to reduce overfitting. This comparative analysis allowed us to assess the trade-offs between model simplicity, interpretability, and flexibility in representing non-linear relationships within the non-image modality.

Image modality

For the image modality, we implemented a two-stage deep learning pipeline comprising a convolutional feature extractor followed by a transformer-based aggregator to model spatial patterns predictive of therapy resistance. Whole slide histology images were first preprocessed and tiled into non-overlapping patches. Patch-level features were extracted using a ResNet-18 model pretrained on lung adenocarcinoma histopathology images (23). These features served as input to a vision transformer (ViT) architecture consisting of 12 transformer encoder layers with 8-head self-attention modules per layer (33).

The ViT was initialized with weights pretrained on The Cancer Genome Atlas (TCGA) (34) histology slides from five cancer types to provide contextual understanding across varied tissue morphologies (35). Fine-tuning was then performed using the study-specific histology data to adapt the model to the resistance prediction task. The self-attention mechanism in the transformer allowed the model to capture both local and global tissue-level dependencies, enhancing its ability to identify subtle but predictive histopathological cues. The final output from the transformer was a 32-dimensional feature embedding for each patient, which was passed to the fusion layer in the multimodal model. This image-based feature extraction strategy enabled high-capacity representation of histologic complexity while maintaining modularity for integration with other clinical and genomic data streams.

Fusion layer

In our multimodal framework, each unimodal model—the image and non-image branches—was first trained independently to serve as a dedicated feature extractor. The resulting feature vectors were then combined in a fusion layer for outcome prediction using a Cox proportional hazards model (Figure 1). We systematically evaluated multiple fusion strategies to determine the most effective method for integrating information across modalities. In the early fusion strategy, intermediate latent features extracted prior to the final prediction layer in each unimodal model were concatenated and passed to the fusion model. In the late fusion strategy, the predicted risk scores from each unimodal output layer were combined as input to the final Cox regression model. Both approaches were evaluated under various configurations, yielding four distinct fusion scenarios.

Figure 1 Overview of the multimodal model architecture. Histology image features and non-image clinical data, including demographics, genomic alterations, radiology, and treatment history, are processed through separate neural networks. The extracted feature representations are integrated via a fusion layer to generate resistance risk predictions.

Model performance under each fusion strategy was assessed using the concordance index (C-index) across 5-fold nested cross-validation. This design ensured unbiased estimation and helped identify the optimal integration approach for the multimodal pipeline. The best-performing configuration was retained for final evaluation. By treating each modality as a complementary information source and explicitly testing multiple fusion mechanisms, this strategy enabled us to identify a robust and interpretable method for combining heterogeneous patient data streams.

Loss function

To accommodate the time-to-event nature of our primary outcome—progression-free survival—we employed a negative log partial likelihood function derived from the Cox proportional hazards model as the loss function for training. This formulation enables the model to learn risk scores associated with covariate patterns while appropriately handling censoring in survival data. The loss function is defined as follows:

Loss(θ)=1Neventi:event(h^θ(Xi)logjR(Ti)eh^θ(Xi))

Where Nevent is the total number of patients who developed resistance during the observation period, h^θ(Xi) denotes the predicted risk score for patient i given model parameters θ, and R(Ti) represents the risk set, i.e., the group of patients still at risk at time Ti, the event time for patient i. This approach allows for effective optimization of survival risk rankings without requiring specification of the baseline hazard function. It also facilitates direct comparison of multimodal model performance across validation folds via concordance-based metrics.

Multi-model training and evaluation

The multimodal model was trained using a three-stage process to optimize feature extraction and mitigate overfitting, given the limited sample size. First, the non-image modality was trained using demographic, NGS, and clinical data. Next, the image modality was trained using histology slides. Finally, the fusion layer was trained using extracted features from both modalities. This modular training strategy decomposed the multimodal pipeline into simpler components, enabling more stable and effective training with limited data.

To enhance generalization and reduce overfitting, we applied regularization and dropout techniques. A 5-fold nested cross-validation was used to prevent information leakage, ensuring consistent training, validation, and testing splits. Model performance was evaluated using the C-index, with 95% confidence intervals (CIs) to quantify uncertainty.

Model interpretation and visualization

To enhance the interpretability of the multimodal model, we analyzed the contributions of both non-image features and spatial patterns in histology slides. For the non-image branch, we used the coefficients from the Cox regression model to quantify feature importance, interpreting the magnitude and direction of each variable’s association with predicted resistance risk. For the image modality, we generated attention heatmaps by averaging self-attention weights across heads and applying recursive multiplication, enabling spatial localization of histologic regions contributing most strongly to model predictions (33). We compared attention distributions before and after fine-tuning to identify shifts in model focus during domain adaptation. To further characterize histologic correlates of risk, we applied a pretrained Hover-Net model (36) to segment and classify nuclei into tumor, inflammatory, stromal, and other cell types. This allowed us to compare cellular composition between predicted high- and low-risk groups, offering biological insight into differential model predictions.

Statistical analysis

Data analysis was conducted in Python using NumPy (version 3.10). Categorical variables were summarized by category counts and frequencies, and continuous variables were reported with means and standard deviations. Missing values in smoking-related variables were imputed using group-level averages stratified by smoking history. Model performance was evaluated using Harrell’s C-index with 95% CIs, estimated through 5-fold nested cross-validation. Pairwise t-tests across cross-validation folds were applied to compare model C-index, and log-rank tests were used to evaluate survival separation across predicted risk groups. Cellular composition differences between high- and low-risk groups were assessed with Mann-Whitney U tests. Statistical significance was determined at a threshold of P<0.05.


Results

Study samples

The final cohort included 42 patients with stage IIIb or IV NSCLC harboring activating EGFR mutations. At the time of last follow-up, 14 patients (33.3%) had experienced disease recurrence. The mean age was 68.8 years with a standard deviation (SD) of 11.9 years, and the majority were female (71.4%). Of the cohort, 54.8% were never-smokers and 7.1% were current smokers. Each patient had an average of 2.0 (SD: 1.2) mutations among the selected hotspot genes, including the EGFR mutation required for inclusion. A detailed summary of patient characteristics stratified by clinical site is provided in Table 1.

Table 1

Baseline demographic, clinical, and treatment characteristics of the cohort, summarized overall and by site

Variables Entire cohort (n=42) DHMC (n=23) Ochsner health system (n=19)
Age (years) 69 [12] 72 [12] 65 [11]
Sex
   Male 12 [29] 6 [26] 6 [32]
   Female 30 [71] 17 [74] 13 [68]
Race
   White 34 [81] 22 [96] 12 [63]
   Black 4 [10] 0 [0] 4 [21]
   Asian 4 [10] 1 [4] 3 [16]
Ethnicity
   Hispanic 1 [2] 0 [0] 1 [5]
   Non-Hispanic 41 [98] 23 [100] 18 [95]
Prior immunotherapy
   No 39 [93] 20 [87] 19 [100]
   Yes 3 [7] 3 [13] 0 [0]
Prior TKI therapy
   No 36 [86] 17 [74] 19 [100]
   Yes 6 [14] 6 [26] 0 [0]
Prior chemotherapy
   No 30 [71] 14 [61] 16 [84]
   Yes 12 [29] 9 [39] 3 [16]
Prior surgery
   No 34 [81] 16 [70] 18 [95]
   Yes 8 [19] 7 [30] 1 [5]
Brain metastasis
   No 24 [57] 13 [57] 11 [58]
   Yes 18 [43] 10 [43] 8 [42]
Recurrence
   No 28 [67] 13 [57] 15 [79]
   Yes 14 [33] 10 [43] 4 [21]
Smoking status
   Never smoker 23 [55] 10 [43] 13 [68]
   Prior smoker 16 [38] 10 [43] 6 [32]
   Current smoker 3 [7] 3 [13] 0 [0]
Smoking quantity (pack year) 13 [22] 15 [25] 11 [18]
Follow-up time (months) 20 [20] 27 [24] 12 [8]
Number of histology slides 44 25 19
Number of pathology reports 42 23 19
Number of mutations per patient 2 [1] 2 [1] 2 [1]

Values are mean [SD] for continuous variables and n [%] for categorical variables. DHMC, Dartmouth-Hitchcock Medical Center; SD, standard deviation; TKI, tyrosine kinase inhibitor.

Of note, 3 patients in our cohort (7.1%) carried the EGFR T790M mutation. Six patients (14.3%) had received prior TKI treatment with erlotinib, afatinib, or gefitinib before osimertinib, including two who carried the T790M mutation. In addition to TKI therapy, 3 patients (7.1%) had received prior immunotherapy, and 12 (28.6%) had received prior chemotherapy. A detailed breakdown of mutations in our cohort is provided in Table 2.

Table 2

Distribution of somatic mutations across study cohorts

Mutations Entire cohort (n=42) DHMC (n=23) Ochsner health system (n=19)
EGFR (exon 19 deletion) 21 (50.0) 12 (52.2) 9 (47.4)
EGFR (amplification) 1 (2.4) 1 (4.3) 0 (0.0)
EGFR (L858R) 17 (40.5) 9 (39.1) 8 (42.1)
EGFR (T790M) 3 (7.1) 3 (13.0) 0 (0.0)
TP53 17 (40.5) 10 (43.5) 7 (36.8)
TP53 disruptive mutation 5 (11.9) 3 (13.0) 2 (10.5)
CDKN2A 4 (9.5) 1 (4.3) 3 (15.8)
CSF1R 1 (2.4) 1 (4.3) 0 (0)
CTNNB1 4 (9.5) 3 (13.0) 1 (5.3)
FBXW7 1 (2.4) 1 (4.3) 0 (0.0)
FGFR1 1 (2.4) 1 (4.3) 0 (0.0)
FLT3 1 (2.4) 1 (4.3) 0 (0.0)
GNAS 1 (2.4) 1 (4.3) 0 (0.0)
JAK3 2 (4.8) 2 (8.7) 0 (0.0)
KDR 1 (2.4) 1 (4.3) 0 (0.0)
KIT 1 (2.4) 1 (4.3) 0 (0.0)
MET 1 (2.4) 1 (4.3) 0 (0.0)
NOTCH1 1 (2.4) 1 (4.3) 0 (0.0)
PTEN 1 (2.4) 0 (0.0) 1 (5.3)
RB1 3 (7.1) 1 (4.3) 2 (10.5)
SMAD4 1 (2.4) 1 (4.3) 0 (0.0)
SMARCB1 1 (2.4) 1 (4.3) 0 (0.0)
STK11 1 (2.4) 1 (4.3) 0 (0.0)

The table shows the number and percentage (in parentheses) of patients with each somatic mutation in the overall cohort and in each site-specific subgroup. DHMC, Dartmouth-Hitchcock Medical Center.

Model performance and comparison between single and multiple modality models

Across 5-fold nested cross-validation, the multimodal model integrating image and non-image features achieved a mean C-index of 0.82 (SD: 0.17; 95% CI: 0.62–1.00), outperforming unimodal models based on non-image (mean C-index: 0.77) and image-only (mean C-index: 0.75) data (Table 3). Pairwise t-tests across folds indicated that the multimodal model improved C-index over non-image and image models by 0.06 (P=0.25) and 0.07 (P=0.17), respectively. Compared with random prediction (C-index =0.5), only the multimodal model showed a statistically significant improvement (P=0.01), whereas unimodal models did not reach significance (P=0.06 for non-image; P=0.08 for image).

Table 3

Comparison of average C-index performance between multimodal and unimodal models across the full cohort and by site

Modalities Full cohort DHMC Ochsner health system
Non-image modality: Cox regression 0.77 (0.22) 0.73 (0.31) 0.72 (0.30)
Image modality: vision transformer 0.75 (0.24) 0.75 (0.35) 0.75 (0.43)
Multimodal (late fusion) 0.82 (0.17) 0.78 (0.30) 0.78 (0.20)

Values represent mean (SD) from five-fold nested cross-validation. C-index, concordance index; DHMC, Dartmouth-Hitchcock Medical Center; SD, standard deviation.

Stratified site-level analysis showed consistent results. Among DHMC patients, the multimodal model achieved a C-index of 0.78, compared to 0.73 and 0.75 for non-image and image modalities, respectively. At Ochsner Health Systems, the multimodal model again performed best (C-index: 0.78), surpassing non-image (0.72) and image-only (0.75) models. These findings highlight the benefit of multimodal integration across heterogeneous clinical populations.

Model capability for hazard stratification

Patients were grouped into four risk strata of the same size based on quartiles of the predicted hazard scores. Kaplan-Meier survival curves (Figure 2) showed that the multimodal model reliably distinguished patients with favorable progression-free survival in the lowest-risk group (1st quartile), who had a mean follow-up time of 34.6 months (SD: 26.1), compared to an average of 14.4 months (SD: 13.8) among patients in the higher-risk groups (2nd, 3rd, and 4th quartiles). In contrast, unimodal models displayed curve crossovers and poor separation. Log-rank tests confirmed significant survival differences for the multimodal model (P=0.04), but not for non-image (P=0.40) or image-only models (P=0.21), supporting the superior hazard stratification of the multimodal approach.

Figure 2 Kaplan-Meier curves showing hazard stratification for multimodal, image-only, and non-image-only models. The multimodal model achieved significant separation across risk groups (P=0.04), outperforming unimodal models.

Feature importance analysis

To gain insight into how the model generated its predictions and which features were most influential, we performed a feature importance analysis. In the multimodal model, several non-image variables contributed strongly to resistance prediction. Based on the magnitude of model coefficients, the most predictive variables included Hispanic ethnicity, Asian ethnicity, and mutations in KIT, KDR, and RB1. To assess statistical significance, we conducted t-tests comparing coefficient values across cross-validation folds against zero. Hispanic ethnicity and RB1 mutation showed significant associations with resistance prediction (P<0.05), suggesting their robust contribution to model outputs. Full feature statistics are presented in Table 4.

Table 4

Feature importance based on average Cox regression coefficients from the non-image modality

Features Average coefficient 95% CI P
Hispanic 2.77 0.84 to 4.70 0.02*
Asian 2.49 −3.38 to 8.36 0.30
KIT −1.38 −4.38 to 1.60 0.27
KDR 1.34 −0.38 to 3.06 0.10
RB1 1.14 0.03 to 2.25 0.047*

Values reflect mean coefficients, 95% CIs, and P values across five cross-validation folds. *, P<0.05. CI, confidence interval.

To interpret image-based predictions, we generated attention heatmaps from the ViT model for two representative patients—one who developed resistance at 7.7 months, and another at 11.2 months. In both cases, high-attention regions (shown in red) corresponded to areas with abnormal nuclear morphology, such as enlarged or irregular nuclei. The model assigned minimal attention to benign or histologically normal regions, indicating its focus on histopathologic features relevant to tumor aggressiveness (Figure 3).

Figure 3 Attention map visualization for two representative patients. The model highlights high- and low-risk regions within histology slides [HE-stained slides, 20× (0.5 µm/pixel) magnification], focusing on tumor areas with nuclear atypia. (A) A patient who developed resistance at 7.7 months; (B) A patient who developed resistance at 11.2 months. H, high-risk; HE, hematoxylin and eosin; L, low-risk; WSI, whole-slide image.

Further spatial analysis was conducted using Hover-Net to quantify the cellular composition of histology slides in high- and low-risk groups. The model classified six cell types: tumor, normal, inflammatory, connective tissue, dead, and unclassifiable. As shown in Figure 4, high-risk patients exhibited significantly fewer inflammatory cells (P=0.005), suggesting reduced immune infiltration and possibly a more immunosuppressive microenvironment. Conversely, low-risk patients had a lower proportion of normal cells (P=0.02), potentially reflecting greater representation of stromal or immune elements. These findings provide biologically meaningful insight into how histologic and cellular features contribute to resistance prediction in the multimodal model.

Figure 4 Comparison of cell type distributions between predicted high- and low-hazard groups. Patients in the high-hazard group had significantly fewer inflammatory cells (P=0.005) and more normal cells (P=0.02) compared to the low-hazard group, based on Mann-Whitney U tests. *, P<0.05.

Performance comparison between model configurations

We conducted an experiment to evaluate how fusion timing affected model performance. All multimodal configurations outperformed unimodal baselines, confirming the benefit of combining data sources. Among the four tested strategies, late fusion—integrating final-layer features from both modalities—achieved the highest mean C-index of 0.81 (Table 5).

Table 5

Comparing model performance using different combinations of early and late fusion across image and non-image modalities

Modalities Unimodal Multimodal
Non-image Image Late non-image + late image Late non-image + early image Early non-image + late image Early non-image + early image
C-index 0.77 (0.22) 0.75 (0.24) 0.82 (0.17) 0.81 (0.19) 0.79 (0.18) 0.81 (0.19)

Values represent average C-index (SD) from five-fold nested cross-validation. C-index, concordance index; SD, standard deviation.

We also compared different non-image model architectures, including Cox regression, SNN, and FFNN (Table 6). The Cox-based model showed the strongest performance (mean C-index: 0.79), while FFNN and SNN performed slightly lower at 0.76 and 0.77, respectively. Given the small sample size, the regularized Cox model provided a favorable balance between performance and model stability.

Table 6

Performance comparison of multimodal models using different non-image architectures (Cox regression, FFNN, and SNN) and fusion strategies

Experiment settings (non-image modality/image modality/fusion layers) Unimodal Multimodal
Non-image Image Late non-image + late image Late non-image + early image Early non-image + late image Early non-image + early image
Cox regression/transformer/cox 0.77 (0.22) 0.75 (0.24) 0.82 (0.17) 0.81 (0.19) 0.79 (0.18) 0.81 (0.19)
FFNN/transformer/cox 0.70 (0.26) 0.75 (0.24) 0.65 (0.23) 0.79 (0.21) 0.78 (0.21) 0.80 (0.19)
SNN/transformer/cox 0.76 (0.28) 0.75 (0.24) 0.76 (0.23) 0.79 (0.20) 0.78 (0.21) 0.76 (0.22)

Values represent average C-index (SD) across five-fold nested cross-validation. C-index, concordance index; FFNN, feed-forward neural networks; SD, standard deviation; SNN, self-normalizing neural network.


Discussion

This study presents a multimodal machine learning model for predicting resistance to EGFR-targeted therapy in NSCLC patients, integrating routinely available histology, genomic, and clinical data. The model achieved strong predictive performance and effective hazard stratification, outperforming unimodal approaches and requiring no manual feature engineering. These findings suggest that multimodal AI may provide a clinically valuable tool for supporting personalized treatment planning and prognosis estimation in patients undergoing EGFR-TKI therapy.

Recent advancements in machine learning, WSI digitization, DNA sequencing, and the adoption of electronic health records have enabled the integration of multimodal AI into clinical workflows for lung cancer management. Currently, treatment decisions rely heavily on expert clinical judgment, which may not be consistently available in low-resource settings. Even in high-resource environments, the interpretation of complex health data often involves considerable uncertainty. Our study addresses this gap by introducing a machine learning tool to predict resistance in EGFR-mutant NSCLC patients receiving osimertinib—an area currently lacking standardized clinical tools. By leveraging data already collected in routine care, our model provides reliable, interpretable risk predictions that could assist clinicians in tailoring treatment strategies and optimizing the sequencing of therapies. Moreover, by offering personalized prognosis estimates, the tool may reduce uncertainty for patients, improve quality of life, and support shared decision-making. Given osimertinib’s high cost and toxicity profile, this model may help patients weigh expected benefit against financial and health risks more effectively (37).

A key strength of our model is its ability to fuse disparate data types—histopathology, NGS, and structured clinical variables—within a single framework. The model achieved a mean C-index of 0.82, consistently outperforming image-only and non-image models across cross-validation folds and across two clinical institutions. Kaplan-Meier curves and log-rank tests confirmed that the multimodal model significantly stratified patients by progression-free survival (P=0.04), while unimodal models failed to do so. These improvements are likely due to the complementary nature of morphological and molecular features, each capturing different aspects of tumor biology. Furthermore, the model generalized well across both institutions, reinforcing its potential utility in varied clinical settings.

Compared to prior work using radiomics or histology alone for resistance prediction (6,7), our approach demonstrates improved performance and interpretability through multimodal integration. For example, recent studies have applied deep learning to CT imaging or histology to infer mutation status or response but have not combined modalities or evaluated resistance in EGFR-mutant populations (14,22,38-41). Our use of late-stage NSCLC patients and inclusion of histology, genomics, and clinical variables represents a novel and practical extension of previous research. Importantly, the model relies only on routinely collected data, offering a pathway for real-world deployment without additional testing or patient burden.

Interpretability analysis identified key predictors that were essential to the model’s decision-making process, including RB1 mutation and Hispanic ethnicity. RB1 is a tumor suppressor gene implicated in lineage plasticity and histologic transformation, both of which have been associated with resistance to EGFR-TKIs (5,42). Our finding that RB1 mutation significantly contributes to predicted resistance risk aligns with previous work and supports its utility as a molecular risk factor. The association with Hispanic ethnicity, while statistically significant, should be interpreted cautiously due to the small sample size and demographic imbalances. Prior studies have shown mixed findings regarding racial and ethnic disparities in NSCLC outcomes, with some reporting poorer survival among Hispanic and Black patients (43,44), and others reporting improved outcomes in Hispanic subgroups (45-47). These inconsistencies emphasize the need for larger, diverse datasets to clarify sociodemographic effects.

Attention-based visualization confirmed that the image model focused on histologic regions rich in deformed nuclei and tumor density—areas typically associated with aggressive disease. Hover-Net-based analysis of nuclear composition further revealed that high-risk patients had significantly fewer inflammatory cells (P=0.005), while low-risk patients had a higher proportion of stromal and immune elements. These patterns may reflect differences in tumor immune microenvironments, with reduced inflammatory infiltrate potentially indicative of immune evasion or suppressed host response (48).

The primary limitation of this study is the modest cohort size, constrained by stringent inclusion criteria and the availability of multimodal data. This limited sample size informed the use of regularized Cox regression for the non-image branch and motivated a modular, three-stage training pipeline to prevent overfitting. Despite these constraints, several encouraging insights emerged: the multimodal model achieved a statistically significant improvement over random chance (P=0.01), whereas single-modality models did not; log-rank analysis showed that only the multimodal model significantly stratified patient risk groups (P=0.04); and site-stratified evaluations revealed consistent performance gains from modality integration. Together, these observations suggest that multimodal learning can enhance predictive accuracy and risk stratification, providing a promising foundation for larger, multi-institutional studies. Future work should aim to expand cohort size through multi-site collaborations, enabling validation across more diverse populations and exploration of more complex model architectures, including end-to-end training. In addition, incorporating interpretable modeling approaches could offer deeper insights into a broad set of potentially clinically relevant determinants, such as the presence of pre-existing squamous-like tumor features and line of therapy, ultimately supporting broader clinical adoption of this pipeline. Overall, this study demonstrates the feasibility and value of a multimodal machine learning approach for predicting EGFR-TKI resistance in NSCLC using routinely available clinical data. With further validation, this framework could serve as a clinical decision support tool, informing treatment selection and improving patient counseling—particularly in settings where access to specialized oncology expertise is limited. Broader application of multimodal AI could accelerate precision oncology, enabling more accurate and individualized care across diverse health systems.


Conclusions

This study presents a multimodal machine learning framework for predicting resistance to EGFR-targeted therapy in NSCLC patients using routinely collected clinical data. By integrating histology images, genomic alterations, and clinical variables, the model achieved strong predictive performance (mean C-index: 0.82) and superior hazard stratification compared to unimodal approaches. It required no manual feature engineering and generalized well across two institutions. The model offers a practical, interpretable tool to support personalized treatment planning, particularly in settings with limited oncologic resources. Key predictors, including RB1 mutation and reduced inflammatory cell presence, were identified as indicators of resistance. These findings underscore the value of combining complementary data modalities to improve prognostic accuracy. Despite the modest sample size, the model remained robust through modular training, regularization, and cross-validation. With further validation in larger, more diverse cohorts, this approach could support real-world clinical decision-making and enhance precision oncology.


Acknowledgments

We thank John Higgins, MS, for his assistance with data collection for this study and Danielle Cohen, MD, PhD, for her valuable feedback on the manuscript.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1012/rc

Data Sharing Statement: Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1012/dss

Peer Review File: Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1012/prf

Funding: This research was supported in part by grants from the US National Library of Medicine (Nos. R01LM012837 and R01LM013833) and the US National Cancer Institute (No. R01CA249758).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-1012/coif). K.D. received payments from Merck, Lilly, Roche, Amgen, Molecular Templates, Daiichi-Sankyo, Conjupro to the institution for clinical trials, and from AstraZeneca for consulting. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Institutional review board (IRB) approval was obtained from both participating institutions: Dartmouth-Hitchcock Medical Center (STUDY00030450) and Ochsner Health System (2023.263). Informed consent was waived due to the retrospective nature of the study and the use of de-identified clinical data.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Thandra KC, Barsouk A, Saginala K, et al. Epidemiology of lung cancer. Contemp Oncol (Pozn) 2021;25:45-52. [Crossref] [PubMed]
  2. Herbst RS, Morgensztern D, Boshoff C. The biology and management of non-small cell lung cancer. Nature 2018;553:446-54. [Crossref] [PubMed]
  3. Soria JC, Ohe Y, Vansteenkiste J, et al. Osimertinib in Untreated EGFR-Mutated Advanced Non-Small-Cell Lung Cancer. N Engl J Med 2018;378:113-25. [Crossref] [PubMed]
  4. Wu YL, Tsuboi M, He J, et al. Osimertinib in Resected EGFR-Mutated Non-Small-Cell Lung Cancer. N Engl J Med 2020;383:1711-23. [Crossref] [PubMed]
  5. Leonetti A, Sharma S, Minari R, et al. Resistance mechanisms to osimertinib in EGFR-mutated non-small cell lung cancer. Br J Cancer 2019;121:725-37. [Crossref] [PubMed]
  6. Tang X, Li Y, Yan WF, et al. Machine Learning-Based CT Radiomics Analysis for Prognostic Prediction in Metastatic Non-Small Cell Lung Cancer Patients With EGFR-T790M Mutation Receiving Third-Generation EGFR-TKI Osimertinib Treatment. Front Oncol 2021;11:719919. [Crossref] [PubMed]
  7. Song Z, Liu T, Shi L, et al. The deep learning model combining CT image and clinicopathological information for predicting ALK fusion status and response to ALK-TKI therapy in non-small cell lung cancer patients. Eur J Nucl Med Mol Imaging 2021;48:361-71. [Crossref] [PubMed]
  8. Kristina Gregory N. NCCN Guidelines Version 3. 2025 Non-Small Cell Lung Cancer Continue NCCN Guidelines Panel Disclosures. Available online: https://www.nccn.org/guidelines/guidelines-detail?category=1&id=1450
  9. Tas F, Ciftci R, Kilic L, et al. Age is a prognostic factor affecting survival in lung cancer patients. Oncol Lett 2013;6:1507-13. [Crossref] [PubMed]
  10. Duffy MJ. Circulating tumor DNA (ctDNA) as a biomarker for lung cancer: Early detection, monitoring and therapy prediction. Tumour Biol 2024;46:S283-95. [Crossref] [PubMed]
  11. Kahraman S, Karakaya S, Kaplan MA, et al. Treatment outcomes and prognostic factors in patients with driver mutant non-small cell lung cancer and de novo brain metastases. Sci Rep 2024;14:5820. [Crossref] [PubMed]
  12. Mathiot L, Nigen B, Goronflot T, et al. Prognostic Impact of TP53 Mutations in Metastatic Nonsquamous Non-small-cell Lung Cancer. Clin Lung Cancer 2024;25:244-253.e2. [Crossref] [PubMed]
  13. Bhateja P, Chiu M, Wildey G, et al. Retinoblastoma mutation predicts poor outcomes in advanced non small cell lung cancer. Cancer Med 2019;8:1459-66. [Crossref] [PubMed]
  14. Laury AR, Blom S, Ropponen T, et al. Artificial intelligence-based image analysis can predict outcome in high-grade serous carcinoma via histology alone. Sci Rep 2021;11:19165. [Crossref] [PubMed]
  15. Liu Y, Chen Q, Zhang Z, et al. Prognostic indicators and nomograms for postoperative survival among younger patients with non-small cell lung cancer. J Thorac Dis 2025;17:2365-76. [Crossref] [PubMed]
  16. Huijgens FL, Hillen MA, Huisinga MJ, et al. Cancer Patients’ Experiences of Burden when Involved in Treatment Decision Making. Medical Decision Making 2025;45:533-44. [Crossref] [PubMed]
  17. Ramathuba DU, Ramutumbu NJ. Psychosocial Distress and the Quality of Life of Cancer Patients in Rural Hospitals in Limpopo Province: A Qualitative Study. Curr Oncol 2025;32:43. [Crossref] [PubMed]
  18. Tomita N, Tafe LJ, Suriawinata AA, et al. Predicting oncogene mutations of lung cancer using deep learning and histopathologic features on whole-slide images. Transl Oncol 2022;24:101494. [Crossref] [PubMed]
  19. Nasir-Moin M, Suriawinata AA, Ren B, et al. Evaluation of an Artificial Intelligence-Augmented Digital System for Histologic Classification of Colorectal Polyps. JAMA Netw Open 2021;4:e2135271. [Crossref] [PubMed]
  20. Campanella G, Hanna MG, Geneslaw L, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 2019;25:1301-9. [Crossref] [PubMed]
  21. Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 2018;24:1559-67. [Crossref] [PubMed]
  22. Tomita N, Abdollahi B, Wei J, et al. Attention-Based Deep Neural Networks for Detection of Cancerous and Precancerous Esophagus Tissue on Histopathological Slides. JAMA Netw Open 2019;2:e1914645. [Crossref] [PubMed]
  23. Wei JW, Tafe LJ, Linnik YA, et al. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Sci Rep 2019;9:3358. [Crossref] [PubMed]
  24. Li Y, Chai X, Yang M, et al. Accurate prediction of disease-free and overall survival in non-small cell lung cancer using patient-level multimodal weakly supervised learning. NPJ Precis Oncol 2025;9:197. [Crossref] [PubMed]
  25. Farooq A, Mishra D, Chaudhury S. Survival Prediction in Lung Cancer through Multi-Modal Representation Learning. 2025 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025:3907-15.
  26. Lu Y, Liu F, Yu Y, et al. AI-enabled molecular phenotyping and prognostic predictions in lung cancer through multimodal clinical information integration. Cell Rep Med 2025;6:102216. [Crossref] [PubMed]
  27. Wu Y, Wang Y, Huang X, et al. Multimodal learning for non-small cell lung cancer prognosis. Biomed Signal Process Control 2025;106:107663.
  28. Yu HA, Arcila ME, Rekhtman N, et al. Analysis of tumor specimens at the time of acquired resistance to EGFR-TKI therapy in 155 patients with EGFR-mutant lung cancers. Clin Cancer Res 2013;19:2240-7. [Crossref] [PubMed]
  29. González-Domínguez J, Sánchez-Barroso G, García-Sanz-Calcedo J, et al. Cox proportional hazards model used for predictive analysis of the energy consumption of healthcare buildings. Energy Build 2022;257:111784.
  30. Tran TT, Lee J, Gunathilake M, et al. A comparison of machine learning models and Cox proportional hazards models regarding their ability to predict the risk of gastrointestinal cancer based on metabolic syndrome and its components. Front Oncol 2023;13:1049787. [Crossref] [PubMed]
  31. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature 1986;323:533-6.
  32. Klambauer G, Unterthiner T, Mayr A, et al. Self-Normalizing Neural Networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA; 2017:972-81.
  33. Jiang S, Hondelink L, Suriawinata AA, et al. Masked pre-training of transformers for histology image analysis. J Pathol Inform 2024;15:100386. [Crossref] [PubMed]
  34. Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 2013;45:1113-20. [Crossref] [PubMed]
  35. Jiang S, Suriawinata AA, Hassanpour S. MHAttnSurv: Multi-head attention for survival prediction using whole-slide pathology images. Comput Biol Med 2023;158:106883. [Crossref] [PubMed]
  36. Graham S, Vu QD, Raza SEA, et al. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med Image Anal 2019;58:101563. [Crossref] [PubMed]
  37. Rodriguez-Gonzalez A, Velasco-Durantez V, Martin-Abreu C, et al. Fatigue, Emotional Distress, and Illness Uncertainty in Patients with Metastatic Cancer: Results from the Prospective NEOETIC_SEOM Study. Curr Oncol 2022;29:9722-32. [Crossref] [PubMed]
  38. Chen RJ, Lu MY, Williamson DFK, et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 2022;40:865-878.e6. [Crossref] [PubMed]
  39. Campanella G, Kumar N, Nanda S, et al. Real-world deployment of a fine-tuned pathology foundation model for lung cancer biomarker detection. Nat Med 2025;31:3002-10. [Crossref] [PubMed]
  40. Zhang Y, Wang S, Liu X, et al. Biopsy image-based deep learning for predicting pathologic response to neoadjuvant chemotherapy in patients with NSCLC. NPJ Precis Oncol 2025;9:132. [Crossref] [PubMed]
  41. Rakaee M, Tafavvoghi M, Ricciuti B, et al. Deep Learning Model for Predicting Immunotherapy Response in Advanced Non-Small Cell Lung Cancer. JAMA Oncol 2025;11:109-18. [Crossref] [PubMed]
  42. Gomatou G, Syrigos N, Kotteas E. Osimertinib Resistance: Molecular Mechanisms and Emerging Treatment Options. Cancers (Basel) 2023;15:841. [Crossref] [PubMed]
  43. Uprety D, Seaton R, Hadid T, et al. Racial and socioeconomic disparities in survival among patients with metastatic non-small cell lung cancer. J Natl Cancer Inst 2024;116:1697-704. [Crossref] [PubMed]
  44. Wisnivesky JP, McGinn T, Henschke C, et al. Ethnic disparities in the treatment of stage I non-small cell lung cancer. Am J Respir Crit Care Med 2005;171:1158-63. [Crossref] [PubMed]
  45. Zhou K, Shi H, Chen R, et al. Association of Race, Socioeconomic Factors, and Treatment Characteristics With Overall Survival in Patients With Limited-Stage Small Cell Lung Cancer. JAMA Netw Open 2021;4:e2032276. [Crossref] [PubMed]
  46. Velotta JB, Von Behren J, Allison RA, et al. Non-small cell lung cancer (NSCLC) survival by race and ethnicity in California, 2014-2019: Differences by sex and smoking history. CHEST Pulmonary 2025; [Crossref]
  47. Kumar R, Castillero F, Bhandari S, et al. The Hispanic Paradox in Non-Small Cell Lung Cancer. Hematol Oncol Stem Cell Ther 2022;15:21-9. [Crossref] [PubMed]
  48. Bremnes RM, Al-Shibli K, Donnem T, et al. The role of tumor-infiltrating immune cells and chronic inflammation at the tumor site on cancer development, progression, and prognosis: emphasis on non-small cell lung cancer. J Thorac Oncol 2011;6:824-33. [Crossref] [PubMed]
Cite this article as: Hua P, Olofson A, Farhadi F, Hondelink L, Tsongalis G, Dragnev K, Hoegemann Savellano D, Suriawinata A, Tafe L, Hassanpour S. Predicting targeted therapy resistance in non-small cell lung cancer using multimodal machine learning. J Thorac Dis 2025;17(10):8700-8714. doi: 10.21037/jtd-2025-1012

Download Citation