Factors linked to lung cancer in MIMIC-IV database
Original Article

Factors linked to lung cancer in MIMIC-IV database

Chengyuan Fang#, Yuwen Bai#, Yanzhong Xin, Luquan Zhang, Jianqun Ma

Department of Thoracic Surgery, Harbin Medical University Cancer Hospital, Harbin, China

Contributions: (I) Conception and design: C Fang, Y Bai; (II) Administrative support: L Zhang; (III) Provision of study materials or patients: Y Xin; (IV) Collection and assembly of data: Y Bai; (V) Data analysis and interpretation: C Fang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Jianqun Ma, MD. Department of Thoracic Surgery, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin 150040, China. Email: jianqunma@hrbmu.edu.cn.

Background: Lung cancer (LC) is the most prevalent type of cancer, yet early prediction of hospital mortality and risk stratification remains inadequate. This study aims to develop a predictive model for in-hospital mortality among patients with LC.

Methods: This study utilized data from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database [2008–2019]. Patients were randomly assigned to training (70%) and validation (30%) cohorts. Baseline characteristics and their associations with LC outcomes were analyzed. Feature selection was performed using Cox regression in the training set. The selected variables were incorporated into a nomogram model, of which its predictive performance and clinical utility were evaluated using calibration plots, receiver operating characteristic (ROC) curves, and decision curve analysis (DCA) in both cohorts.

Results: A total of 645 LC patients were included, with 451 in the training set and 194 in the validation set. Twenty-two baseline characteristics were significantly associated with LC mortality. Six key variables—alkaline phosphatase (ALP) minimum (min), bilirubin total min, length of stay (LOS), race, respiratory rate min, and Simplified Acute Physiology Score II (SAPS II)—were identified for nomogram development. The final model demonstrated strong predictive accuracy and clinical utility, achieving an area under the curve (AUC) exceeding 0.7 in both cohorts.

Conclusions: This study identified key risk factors for in-hospital mortality among critically ill patients with LC and developed a robust predictive model using MIMIC-IV data. The findings provide valuable insights into LC prognosis and may aid in clinical decision-making.

Keywords: Lung cancer (LC); intensive care unit (ICU); Medical Information Mart for Intensive Care-IV (MIMIC-IV); feature variables


Submitted Nov 18, 2024. Accepted for publication Mar 25, 2025. Published online May 26, 2025.

doi: 10.21037/jtd-2024-1998


Highlight box

Key findings

• This study identified the key risk factors for in-hospital mortality among critically ill patients with lung cancer (LC) and developed a robust predictive model by using the data from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database. These findings provided valuable insights into LC prognosis and might aid in clinical decision-making.

What is known and what is new?

• Due to the high mortality rate of LC and its low 5-year survival rate, intensive care unit doctors had difficulty in timely and accurately assessing the in-hospital mortality risk of LC patients. Moreover, the MIMIC database was rarely used to develop relevant prediction models.

• This study utilized the MIMIC-IV database to identify six variables associated with the in-hospital mortality of LC patients. It was found that the length of stay was a protective factor, while the others were risk factors. It provided new ideas and strategies for the early detection of LC, prognosis assessment, and clinical management.

What is the implication, and what should change now?

• This study identified the key variables related to in-hospital mortality in LC patients and constructed a nomogram model. Its significance was to assist in clinical decision-making, helping doctors assess risks more accurately, develop targeted treatment plans, allocate resources rationally, and strengthen the monitoring of high-risk patients. It provided guidance for subsequent research.


Introduction

Background

Lung cancer (LC) is a malignant tumor originating in the lungs, primarily classified into non-small cell LC (NSCLC) and small cell LC (SCLC) (1). It remains one of the leading causes of cancer-related mortality worldwide, largely due to late-stage diagnosis resulting from nonspecific early symptoms (2). The 5-year survival rate for LC is only 18%, and even after treatment—such as surgery, chemotherapy, or radiation—recurrence remains a significant concern (3). Critically ill patients with LC frequently require intensive care unit (ICU) admission, often for mechanical ventilation due to respiratory failure. For those with stage IV disease, the mortality rate increases to 68%. While ICU outcomes for patients with LC have improved over time, mortality rates remain high. A major challenge for ICU clinicians is the lack of effective early prediction and risk stratification models for in-hospital mortality (4). Therefore, identifying high-risk individuals could enhance clinical decision-making and optimize patient management.

Rationale and knowledge gap

The Medical Information Mart for Intensive Care (MIMIC) database, developed in collaboration with the MIT Laboratory for Computational Physiology, Harvard Medical School’s Beth Israel Deaconess Medical Centre, and Philips Healthcare, provides extensive clinical data (5). This database contains detailed patient information, including demographics, laboratory measurements, medication usage, vital signs, surgical procedures, disease diagnoses, and medication management (6). This publicly available resource has been widely used in disease research, facilitating the exploration of disease mechanisms and outcomes (7). A prior study has leveraged MIMIC data to examine the association between the advanced LC inflammation index (ALI) and in-hospital mortality in patients with community-acquired pneumonia (CAP) (8). However, few studies have utilized this database to develop predictive models for LC-related in-hospital mortality.

Objective

This study aims to address this gap by analyzing mortality trends among hospitalized patients with LC using MIMIC data. Key predictive variables were identified through univariate and multivariate regression analyses, and a nomogram model was constructed to estimate in-hospital mortality risk. The model’s performance was validated to provide a comprehensive prognostic tool for critically ill patients with LC, contributing to improved risk assessment and clinical decision-making. We present this article in accordance with the TRIPOD reporting checklist (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2024-1998/rc).


Methods

Data collection and participant details

This study utilized data from the MIMIC-IV (v1.0) database (https://mimic-iv.mit.edu/), which contains extensive clinical records, including physiological parameters, laboratory results, and imaging data from 2008 to 2019. Patients diagnosed with LC (LC, n=4,155) were identified using the International Classification of Diseases, 9th revision (ICD-9: 1623, 1624, 1625, 1628, 1629, 2123, 2312, 9348, 20921, 20961, and V1011) and 10th revision (ICD-10: C3411, C3412, C342, C3430, C3431, C3432, C3480, C3481, C3482, C3490, C3491, C3492, D020, D0221, D0222, D1432, D381, D3A090, Z801, Z85110, and Z85118). Exclusion criteria were as follows: patients age <18 years old (n=0), patients with stayed less than 48 h and more than 180 days in ICU (n=521). Using records of the patient’s first admission to the ICU for analyses if the patient was readmitted to the hospital between 2008 and 2019. After applying these criteria, 645 patients were included and randomly assigned to a training set (n=451) and a validation set (n=194) (Figure 1). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Figure 1 The flow chart of the study. ALI, advanced lung cancer inflammation index; ICU, intensive care unit; MIMIC-IV, the Medical Information Mart for Intensive Care-IV.

Data retrieval

Data collected from databases included: categorical variables: gender, race, antibiotic, heart failure (HF), diabetes mellitus (DM), atrial fibrillation, valvular heart disease (VHD), liver disease (LD), nephropathy; continuous variables: length of stay (LOS), age, white blood cell count (WBC) min, height, Glasgow Coma Scale (GCS) minimum (min), temperature min, heart rate min, chronic obstructive pulmonary disease (COPD), respiratory rate min, percutaneous arterial oxygen saturation (SpO2) min, mean blood pressure (MBP) min, systolic blood pressure (SBP) min, glucose level min, LOS hospital, diastolic blood pressure (DBP) min, Simplified Acute Physiology Score II (SAPS II), chloride min, hematocrit min, hemoglobin min, platelets min, anion gap min, bicarbonate min, calcium min, creatinine min, alanine aminotransferase (ALT) min, sodium min, potassium min, absolute (ABS) basophils min, partial thromboplastin time (PTT) min, ABS eosinophils min, ABS lymphocytes min, blood urea nitrogen (BUN) min, ABS monocytes min, aspartate aminotransferase (AST) min, ABS neutrophils min, international normalized ratio (INR) min, prothrombin time (PT) min, alkaline phosphatase (ALP) min, and total bilirubin min. Notably, the in-hospital mortality rate in patients with LC was set as the primary outcome.

Statistical analysis

All statistical analyses were performed using R Studio (v.4.2.2). Initially, t-tests (for continuous variables) and Chi-squared tests (for categorical variables) were used to assess the balance between the training and validation sets (P>0.05). A baseline characteristics table was generated using the table one package (v.0.13.2) (9) to compare variables between the survival and death groups. Significant differences in baseline characteristics were identified using t-tests and Chi-squared tests (P<0.05). Continuous variables were reported as mean ± standard deviation (SD), while categorical variables were presented as percentages. Feature selection was conducted using the survival package (v.3.3-1) (10) via univariate and multivariate Cox regression analyses (P<0.05) and the proportional hazards (PH) assumption test {hazard ratio (HR) [95% confidence interval (CI)] not equal to 1, P>0.05}. Subsequently, results were visualized using forest plots generated with the forest plot package (v.3.1.1) (11). Furthermore, the rms package (v.6.5-1) (12) was utilized to construct a nomogram model for the feature variables in the training sets. Calibration and receiver operating characteristic (ROC) curves were also plotted to assess its predictive performance. A decision curve analysis (DCA) evaluated its clinical utility. Specifically, rms, pROC (v.1.18.0) (13), and ggDCA packages (v.1.2) (14) were utilized to plot calibration, ROC, and DCA curves, respectively. Wilcoxon rank-sum test was used to compare the differences between the two groups. Fisher’s exact test was used to calculate the P value. P<0.05 was considered statistically significant.


Results

Balanced detection of training and validation sets

T-tests and Chi-squared tests showed no significant differences between the training and validation sets for all variables except for ABS monocytes min (P>0.05), confirming that the grouping was appropriate (Tables 1,2).

Table 1

Continuous variables

Variables Training set (n=451) Validation set (n=194) P
LOS (days) 6.404 5.974 0.52
Age (years) 69.867 69.639 0.50
Height (cm) 168.093 167.402 0.43
GCS min 13.752 13.974 0.53
Temperature min (°C) 36.360 36.363 0.78
Heart rate min (bpm) 72.678 74.366 0.29
Respiratory rate min (bpm) 13.185 12.951 0.57
SpO2 min (%) 90.388 91.129 0.06
MBP min (mmHg) 58.424 58.021 0.25
DBP min (mmHg) 46.584 45.428 0.11
SBP min (mmHg) 89.822 88.039 0.19
Glucose min (mmol/L) 113.612 114.660 0.15
LOS hospital (days) 12.808 12.645 0.92
SAPS II 40.317 39.758 0.53
Hematocrit min (%) 31.919 31.327 0.19
Hemoglobin min (g/L) 10.421 10.194 0.13
Platelets min (×109/L) 234.228 244.863 0.13
WBC min (×109/L) 11.391 12.030 0.74
Anion gap min (mmol/L) 13.102 12.825 0.25
Bicarbonate min (mmol/L) 23.124 22.892 0.95
BUN min (mmol/L) 21.299 20.026 0.27
Calcium min (mmol/L) 8.250 8.275 0.42
Chloride min (mmol/L) 100.523 100.773 0.89
Creatinine min (μmol/L) 0.972 1.010 0.53
Sodium min (mmol/L) 136.652 136.454 0.62
Potassium min (mmol/L) 4.027 3.985 0.44
ABS basophils min (×109/L) 0.025 0.023 0.09
ABS eosinophils min (×109/L) 0.050 0.063 0.79
ABS lymphocytes min (×109/L) 0.894 0.896 0.41
ABS monocytes min (×109/L) 0.619 0.519 0.03
ABS neutrophils min (×109/L) 10.568 11.045 0.75
INR min 1.281 1.264 0.86
PT min (s) 14.059 13.814 0.63
PTT min (s) 30.239 30.489 0.41
ALT min (U/L) 38.452 35.933 0.79
ALP min (U/L) 92.133 99.881 0.86
AST min (U/L) 61.033 48.714 0.58
Bilirubin total min (μmol/L) 0.698 0.596 0.33

Data are presented as mean. ABS, absolute; ALP, alkaline phosphatase; ALT, alanine aminotransferase; AST, aspartate aminotransferase; BUN, blood urea nitrogen; DBP, diastolic blood pressure; GCS, Glasgow Coma Scale; INR, international normalized ratio; LOS, length of stay; MBP, mean blood pressure; min, minimum; PT, prothrombin time; PTT, partial thromboplastin time; SAPS II, Simplified Acute Physiology Score II; SBP, systolic blood pressure; SpO2, percutaneous arterial oxygen saturation; WBC, white blood cell count.

Table 2

Categorical variables

Variables Training set (n=451) Validation set (n=194) P
Gender >0.99
   Female 216 98
   Male 235 96
Race 0.19
   White 300 141
   Black 29 14
   Other 122 39
Antibiotic >0.99
   Yes 346 152
   No 105 42
HF >0.99
   Yes 93 46
   No 358 148
DM >0.99
   Yes 103 37
   No 348 157
Atrial fibrillation >0.99
   Yes 155 60
   No 296 134
VHD >0.99
   Yes 1 1
   No 450 193
COPD >0.99
   Yes 78 32
   No 373 162
LD >0.99
   Yes 8 3
   No 443 191
Nephropathy >0.99
   Yes 69 33
   No 382 161

COPD, chronic obstructive pulmonary disease; DM, diabetes mellitus; HF, heart failure; LD, liver disease; VHD, valvular heart disease.

Identification of 20 variables associated with patients with LC

Baseline characteristic analysis is essential for identifying potential confounders and ensuring research accuracy. Comparisons between the survival group (n=353) and death group (n=98) in the training set identified 20 variables significantly associated with in-hospital mortality among LC patients. These included age, race, heart rate min, respiratory rate min, SpO2 min, SBP min, antibiotic use, SAPS II, LD, nephropathy, hematocrit min, hemoglobin min, WBC min, anion gap min, BUN min, chloride min, sodium min, ABS neutrophils min, ALP min, and total bilirubin min (Table 3).

Table 3

Correlation with the LC survival and death groups

Variables Death group (n=98) Survival group (n=353) P
LOS (days) 7.12±5.31 6.21±6.60 0.21
Age (years) 72.64±11.51 69.10±13.30 0.02
Gender 0.92
   Female 46 (46.9) 170 (48.2)
   Male 52 (53.1) 183 (51.8)
Height (cm) 167.04±7.98 168.39±6.98 0.10
Race 0.01
   Black 7 (7.1) 22 (6.2)
   Other 39 (39.8) 83 (23.5)
   White 52 (53.1) 248 (70.3)
GCS min 13.45±3.03 13.84±2.33 0.17
Temperature min (°C) 36.27±0.57 36.39±0.58 0.07
Heart rate min (bpm) 75.74±16.32 71.83±15.54 0.03
Respiratory rate min (bpm) 13.86±3.98 13.00±3.52 0.04
SpO2 min (%) 89.31±5.52 90.69±4.68 0.01
MBP min (mmHg) 56.90±14.26 58.85±13.40 0.21
DBP min (mmHg) 46.23±10.76 46.68±10.11 0.70
SBP min (mmHg) 86.03±18.44 90.88±15.79 0.01
Glucose min (mmol/L) 111.77±32.21 114.12±33.00 0.52
Antibiotic <0.001
   No 10 (10.2) 95 (26.9)
   Yes 88 (89.8) 258 (73.1)
LOS hospital (days) 12.87±9.90 12.79±9.24 0.93
SAPS II 47.82±14.5 38.24±11.63 <0.001
HF 0.23
   No 73 (74.5) 285 (80.7)
   Yes 25 (25.5) 68 (19.3)
DM 0.39
   No 72 (73.5) 276 (78.2)
   Yes 26 (26.5) 77 (21.8)
Atrial fibrillation 0.35
   No 60 (61.2) 236 (66.9)
   Yes 38 (38.8) 117 (33.1)
VHD >0.99
   No 98 (100.0) 352 (99.7)
   Yes 0 (0.0) 1 (0.3)
COPD 0.28
   No 77 (78.6) 296 (83.9)
   Yes 21 (21.4) 57 (16.1)
LD 0.02
   No 93 (94.9) 350 (99.2)
   Yes 5 (5.1) 3 (0.8)
Nephropathy 0.04
   No 76 (77.6) 306 (86.7)
   Yes 22 (22.4) 47 (13.3)
Hematocrit min (%) 30.70±6.34 32.26±6.38 0.03
Hemoglobin min (g/L) 9.90±2.09 10.57±2.13 0.01
Platelets min (×109/L) 231.08±136.64 235.10±110.38 0.76
WBC min (×109/L) 12.60±5.86 11.05±5.33 0.01
Anion gap min (mmol/L) 13.87±3.48 12.89±3.03 0.01
Bicarbonate min (mmol/L) 22.65±5.44 23.25±4.93 0.29
BUN min (mmol/L) 27.26±21.21 19.65±12.79 <0.001
Calcium min (mmol/L) 8.23±0.95 8.26±0.90 0.78
Chloride min (mmol/L) 98.74±7.68 101.02±5.70 <0.001
Creatinine min (μmol/L) 1.11±0.88 0.93±0.73 0.051
Sodium min (mmol/L) 135.49±6.38 136.97±4.72 0.01
Potassium min (mmol/L) 4.07±0.60 4.01±0.52 0.32
ABS basophils min (×109/L) 0.02±0.02 0.03±0.03 0.17
ABS eosinophils min (×109/L) 0.04±0.08 0.05±0.14 0.29
ABS lymphocytes min (×109/L) 0.84±0.63 0.91±0.53 0.26
ABS monocytes min (×109/L) 0.72±1.65 0.59±0.37 0.16
ABS neutrophils min (×109/L) 11.65±5.41 10.27±4.52 0.01
INR min 1.36±0.55 1.26±0.46 0.07
PT min (s) 14.92±5.44 13.82±4.91 0.056
PTT min (s) 30.90±13.55 30.06±10.09 0.50
ALT min (U/L) 41.02±65.20 37.74±138.80 0.82
ALP min (U/L) 112.32±95.47 86.53±39.56 <0.001
AST min (U/L) 71.67±148.47 58.08±342.30 0.70
Bilirubin total min (μmol/L) 1.07±3.16 0.59±0.57 0.01

Data are presented as mean ± SD or number (%). ABS, absolute; ALP, alkaline phosphatase; ALT, alanine aminotransferase; AST, aspartate aminotransferase; BUN, blood urea nitrogen; COPD, chronic obstructive pulmonary disease; DBP, diastolic blood pressure; DM, diabetes mellitus; GCS, Glasgow Coma Scale; HF, heart failure; INR, international normalized ratio; LC, lung cancer; LD, liver disease; LOS, length of stay; MBP, mean blood pressure; min, minimum; PT, prothrombin time; PTT, partial thromboplastin time; SAPS II, Simplified Acute Physiology Score II; SBP, systolic blood pressure; SD, standard deviation; SpO2, percutaneous arterial oxygen saturation; VHD, valvular heart disease; WBC, white blood cell count.

Acquisition of six final feature variables associated with patients with LC

Univariate Cox regression analysis identified 17 candidate variables associated with in-hospital mortality, including ABS neutrophil min, age, ALP min, anion gap min, bilirubin total min, BUN min, chloride min, creatinine min, height, LOS, race, respiratory rate min, SAPS II, SBP min, sodium min, SpO2 min, and WBC min (P<0.05) (Figure 2A). After further screening using multivariate Cox regression and the PH assumption test (P>0.05), six key predictive variables were identified: ALP min, bilirubin total min, LOS, race, respiratory rate min, and SAPS II (P<0.05) (Figure 2B). Among these, LOS was identified as a protective factor (HR ≤1), suggesting a potential role in mitigating LC progression. In contrast, the other five variables (HR >1) were risk factors, indicating their possible contribution to disease progression.

Figure 2 Univariate and multifactor regression models. (A) Univariate regression analysis. (B) Multivariate Cox regression analysis. ABS, absolute; ALP, alkaline phosphatase; BUN, blood urea nitrogen; CI, confidence interval; HR, hazard ratio; LOS, length of stay; min, minimum; SAPS II, Simplified Acute Physiology Score II; SBP, systolic blood pressure; SpO2, percutaneous arterial oxygen saturation; WBC, white blood cell count.

Construction of a nomogram with outstanding predictive ability and clinical utility

Using the six identified predictive variables, a nomogram model was constructed to estimate in-hospital mortality among patients with LC (Figure 3A). Higher total scores on the nomogram corresponded to an increased risk of in-hospital mortality. Notably, the calibration curve of the nomogram revealed a slope approaching 1, exhibiting its strong predictive capability. (Figure 3B,3C). Moreover, the ROC curve confirmed the model’s predictive capability, with an area under the curve (AUC) of 0.7569 in the training set and 0.7603 in the validation set, suggesting good discrimination ability (Figure 3D,3E). Finally, DCA revealed that the nomogram outperformed both the strategies, as well as individual feature variable models, highlighting its strong clinical utility (Figure 3F,3G). These findings underscore the potential of the developed nomogram as a reliable tool for predicting in-hospital mortality in critically ill patients with LC.

Figure 3 Construction of predicted in LC patients. (A) Construction of nomogram. (B,C) Calibration curves for the line plots of the (B) training set and (C) validation set. (D,E) ROC curves in the (D) training set and (E) validation set. (F,G) DCA for (F) training set and (G) validating set columns. ALP, alkaline phosphatase; DCA, decision curve analysis; LC, lung cancer; LOS, length of stay; min, minimum; ROC, receiver operating characteristic; SAPS II, Simplified Acute Physiology Score II.

Discussion

Key findings

This study leveraged the MIMIC database to investigate the association between LC and multiple clinical variables. Univariate and multivariate regression analyses identified ALP min, total bilirubin min, race, respiratory rate min, and SAPS II as independent risk factors for in-hospital mortality, while LOS emerged as an independent protective factor. A nomogram model was subsequently developed and validated using calibration curves, ROC curves, and DCA, demonstrating strong predictive performance. These findings provide valuable data-driven insights for the clinical management of critically ill patients with LC, supporting more accurate prognostic assessments and personalized treatment planning through predictive modelling.

Strengths and limitations

There are several advantages in this study. First, this study utilized the MIMIC-IV database from 2008 to 2019, which contained a large number of clinical records. These rich data enabled a comprehensive analysis of multiple variables related to LC patients, improving the reliability of the research results. Second, the study employed a series of rigorous statistical methods. The data were divided into a training set and a validation set, and Cox regression was used for feature selection. Meanwhile, calibration plots, ROC curves, and DCA were used to evaluate the model, ensuring the scientific nature and accuracy of the research. Finally, six key variables associated with in-hospital mortality in LC patients were successfully identified, and a nomogram model was constructed. The model had high prediction accuracy, with an AUC exceeding 0.7 in both the training set and the validation set, providing a practical tool for clinical decision-making. Nevertheless, there are some limitations to this study. The findings are based on a publicly available database (MIMIC) from a hospital in the United States. Given the potential differences in healthcare systems, treatment regimens, and patient characteristics across regions, these data may not be fully representative of the global patient population. Further multicenter studies are needed to validate and generalize these findings. In addition, the MIMIC database does not contain critical information. First, there is a lack of tumor-node-metastasis (TNM) staging data of cancer patients. Second, data on the number of patients undergoing surgery and tumor treatment, and mortality rates among different treatment groups were not available. Third, the database does not provide information on the specific cause of death (e.g., infection, respiratory failure, multiple organ failure) and place of death; Fourth, the lack of information on whether patients received surgical treatment hindered the comparative analysis of mortality.

Comparison with similar research

LC remains the leading cause of cancer-related mortality worldwide, with fewer than 7% of patients surviving beyond 10 years post-diagnosis across all stages (15). Current clinical research primarily focuses on early diagnosis, biomarker discovery, and personalized treatment strategies to improve survival and quality of life (16). Additionally, immunotherapy and targeted therapy have emerged as key areas of research, offering new therapeutic options and potentially improving prognoses (15). However, accurate risk prediction models are essential for enhancing clinical decision-making and optimizing patient outcomes.

Explanations of findings

Analysis of baseline characteristics between survival and non-survival groups identified 20 variables significantly associated with in-hospital mortality. Among these, age is a well-documented high-risk factor in critically ill patients with LC, with older individuals facing an elevated risk of mortality during hospitalization (17). Additionally, a nationwide cohort study demonstrated a significant association between prolonged antibiotic use and an increased LC risk in a duration-dependent manner (17). Another study reported that antibiotics negatively impact immune checkpoint inhibitor (ICI) therapy and overall survival in patients with NSCLC, potentially due to disruptions in gut microbiota that influence immune responses (17).

The six characteristic variables identified through univariate and multivariate Cox regression analyses included ALP min, bilirubin total min, LOS, race, respiratory rate min, and SAPS II. Bilirubin exhibits strong antioxidant properties and is hypothesized to play a protective role against cancer development. Systematic reviews and meta-analyses on various cancer types, including colorectal, breast, and LC, have demonstrated a negative association between increased bilirubin levels and the incidence of lung and gynecological cancers (18,19). Specifically, moderately elevated serum bilirubin levels are linked to a reduced risk of LC, particularly among smokers. Genetically elevated bilirubin levels, which are relatively common in the general population, may offer protection against LC in individuals exposed to high levels of smoke oxidants (20). SAPS II is widely recognized for its strong predictive value in assessing in-hospital mortality in coronary care unit (CCU) patients (21). Similarly, pretreatment ALP levels have been shown to correlate with tumor stage and prognosis in LC, with lower ALP levels being associated with poorer survival and more advanced disease stages (21). A shorted ICU LOS is often associated with lower hospital costs, optimized resource allocation, and improved patient outcomes (21). Additionally, disparities in LC screening eligibility across racial groups remain a significant concern, as many individuals diagnosed with LC do not meet current screening criteria (21). These findings can aid clinicians in personalizing treatment strategies and optimizing resource allocation, ultimately enhancing patient management and clinical decision-making.

Implications and actions needed

The results of this study have a key significance for daily ICU clinical practice. The constructed histogram model can be integrated into the ICU electronic medical record system, and the medical staff can input the patient’s ALP min, total bilirubin min, LOS, race, respiratory rate min, SAPS II, and other data, and the system can automatically generate a hospital death risk score for staff to make more rational treatment decisions accordingly. Stratified management of LC patients can be implemented based on the score of the nomogram model. Support and intervention should be strengthened for patients with high-risk scores. On the one hand, the treatment strategy is optimized, and the patients with abnormal liver function are monitored, supplemented with albumin and used liver protective drugs, and the patients with affected respiratory function are precisely adjusted with the mechanical ventilation parameters. At the same time, multidisciplinary collaborative diagnosis and treatment (MDT) will be strengthened, and experts from the Department of Respiratory Medicine and Oncology will consult with precise imaging and advanced pathological technology to develop personalized plans. We can also explore new treatment methods and clinical trials, research on relevant risk factors, promote patient participation in clinical trials, focus on cutting-edge results, and bring new hope to patients. For patients with medium and low risk scores, on the basis of ensuring basic treatment, reasonable medical resources should be planned, such as appropriately reducing the frequency of examination to improve medical experience. In the future, we should continue to optimize the model, incorporate more clinical data and variables, combine new research results and clinical experience, and constantly improve risk assessment indicators and treatment strategies, so that research results can better serve the treatment and management of LC patients in ICU, and effectively improve their treatment level and management quality. For future studies, other databases or datasets containing TNM staging information should be included and combined with existing data to reevaluate the role of TNM staging in predictive models of in-hospital mortality in LC patients. At the same time, detailed treatment-related data, including surgical and oncological information, should be included in a larger and more diverse sample of patients. This will help to explore more fully the relationship between treatment and mortality. In addition, future studies should collect data on cause and place of death to further validate and optimize predictive models.


Conclusions

In conclusion, this study identified key risk factors associated with mortality in critically ill patients with LC and developed a predictive model using the MIMIC database. The findings offer valuable insights into LC prognosis, supporting improved clinical decision-making and patient management.


Acknowledgments

We thank the Jianqun Ma team for their support and assistance with this study.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2024-1998/rc

Peer Review File: Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2024-1998/prf

Funding: This work was supported by the Haiyan Foundation of Harbin Medical University Cancer Hospital (No. JJQN2022-13).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2024-1998/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Zhu H, Chen C, Guo H, et al. The causal role of immune cells on lung cancer: a bi-directional Mendelian randomization (MR) study. Aging (Albany NY) 2024;16:10063-73. [Crossref] [PubMed]
  2. Ma Y, Chen H, Li H, et al. Intratumor microbiome-derived butyrate promotes lung cancer metastasis. Cell Rep Med 2024;5:101488. [Crossref] [PubMed]
  3. Peng W, Li B, Li J, et al. Clinical and genomic features of Chinese lung cancer patients with germline mutations. Nat Commun 2022;13:1268. [Crossref] [PubMed]
  4. Li MY, Liu LZ, Dong M. Progress on pivotal role and application of exosome in lung cancer carcinogenesis, diagnosis, therapy and prognosis. Mol Cancer 2021;20:22. [Crossref] [PubMed]
  5. Zheng R, Qian S, Shi Y, et al. Association between triglyceride-glucose index and in-hospital mortality in critically ill patients with sepsis: analysis of the MIMIC-IV database. Cardiovasc Diabetol 2023;22:307. [Crossref] [PubMed]
  6. Takkavatakarn K, Oh W, Chan L, et al. Machine learning derived serum creatinine trajectories in acute kidney injury in critically ill patients with sepsis. Crit Care 2024;28:156. [Crossref] [PubMed]
  7. Zhang Y, Hu J, Hua T, et al. Development of a machine learning-based prediction model for sepsis-associated delirium in the intensive care unit. Sci Rep 2023;13:12697. [Crossref] [PubMed]
  8. Huang Y, Wang X, Li Z, et al. A novel nutritional inflammation index for predicting mortality in acute ischemic stroke patients: insights into advanced lung cancer inflammation index from the Medical Information Mart for Intensive Care-IV database. Front Nutr 2024;11:1408372. [Crossref] [PubMed]
  9. Panos A, Mavridis D. TableOne: an online web application and R package for summarising and visualising data. Evid Based Ment Health 2020;23:127-30. [Crossref] [PubMed]
  10. Lei J, Qu T, Cha L, et al. Clinicopathological characteristics of pheochromocytoma/paraganglioma and screening of prognostic markers. J Surg Oncol 2023;128:510-8. [Crossref] [PubMed]
  11. Li Y, Lu F, Yin Y. Applying logistic LASSO regression for the diagnosis of atypical Crohn's disease. Sci Rep 2022;12:11340. [Crossref] [PubMed]
  12. Sachs MC. plotROC: A Tool for Plotting ROC Curves. J Stat Softw 2017;79:2. [Crossref] [PubMed]
  13. Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011;12:77. [Crossref] [PubMed]
  14. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006;26:565-74. [Crossref] [PubMed]
  15. Ren Y, Zhang L, Xu F, et al. Risk factor analysis and nomogram for predicting in-hospital mortality in ICU patients with sepsis and lung infection. BMC Pulm Med 2022;22:17. [Crossref] [PubMed]
  16. Lahiri A, Maji A, Potdar PD, et al. Lung cancer immunotherapy: progress, pitfalls, and promises. Mol Cancer 2023;22:40. [Crossref] [PubMed]
  17. Liu Y, Tang T, Wang C, et al. Analysis of the incidence and influencing factors of abdominal distension in postoperative lung cancer patients in ICU based on real-world data: a retrospective cohort study. BMC Surg 2024;24:26. [Crossref] [PubMed]
  18. Freisling H, Seyed Khoei N, Viallon V, et al. Gilbert's syndrome, circulating bilirubin and lung cancer: a genetic advantage? Thorax 2020;75:916-7. [Crossref] [PubMed]
  19. Monroy-Iglesias MJ, Moss C, Beckmann K, et al. Serum Total Bilirubin and Risk of Cancer: A Swedish Cohort Study and Meta-Analysis. Cancers (Basel) 2021;13:5540. [Crossref] [PubMed]
  20. Hou N, Li M, He L, et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med 2020;18:462. [Crossref] [PubMed]
  21. Kahraman F, Yılmaz AS, Ersoy İ, et al. Predictive outcomes of APACHE II and expanded SAPS II mortality scoring systems in coronary care unit. Int J Cardiol 2023;371:427-31. [Crossref] [PubMed]
Cite this article as: Fang C, Bai Y, Xin Y, Zhang L, Ma J. Factors linked to lung cancer in MIMIC-IV database. J Thorac Dis 2025;17(5):2765-2777. doi: 10.21037/jtd-2024-1998

Download Citation