Bioinformatics-based screening of genes associated with dilated cardiomyopathy

Yanghua Liu; Qingshan Wu; Fangni Mo; Ye Mai; Shengyuan Zhang; Xiang Wu

doi:10.21037/jtd-2025-132

Original Article

Bioinformatics-based screening of genes associated with dilated cardiomyopathy

Yanghua Liu¹, Qingshan Wu¹, Fangni Mo¹, Ye Mai², Shengyuan Zhang^3*, Xiang Wu^1*

¹Department of Laboratory Medicine, Hainan Hospital, Guangdong Provincial Hospital of Traditional Chinese Medicine, Haikou, China; ²Critical Care Medicine, Hainan Hospital, Guangdong Provincial Hospital of Traditional Chinese Medicine, Haikou, China; ³Department of Cardiovascular Medicine, Hainan Hospital, Guangdong Provincial Hospital of Traditional Chinese Medicine, Haikou, China

Contributions: (I) Conception and design: Y Liu, S Zhang, X Wu; (II) Administrative support: X Wu; (III) Provision of study materials or patients: X Wu; (IV) Collection and assembly of data: Y Liu, S Zhang, Q Wu; (V) Data analysis and interpretation: Y Liu, S Zhang, X Wu; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^*These authors contributed equally to this work.

Correspondence to: Shengyuan Zhang, MS. Department of Cardiovascular Medicine, Hainan Hospital, Guangdong Provincial Hospital of Traditional Chinese Medicine, No. 47 Heping North Road, Meilan District, Haikou 570203, China. Email: 377267459@qq.com; Xiang Wu. Department of Laboratory Medicine, Hainan Hospital, Guangdong Provincial Hospital of Traditional Chinese Medicine, No. 47 Heping North Road, Meilan District, Haikou 570203, China. Email: 1873088611@qq.com.

Background: Due to the lack of appropriate diagnostic biomarkers and intervention targets, the diagnosis and treatment of dilated cardiomyopathy (DCM) in clinical practice are considerably challenging. Therefore, this study aimed to identify reliable biomarker genes using bioinformatics methods to improve the clinical management of DCM.

Methods: Three DCM gene datasets, GSE120895, GSE42955, and GSE3586, were downloaded from the Gene Expression Omnibus (GEO). Differential gene analysis was used to screen for differentially expressed genes in these datasets, and weighted gene coexpression network analysis (WGCNA) was used to screen for the gene coexpression modules most relevant to DCM. Machine learning algorithms and a Protein-protein interaction (PPI) network were used to screen for the core DCM genes in the gene coexpression module.

Results: WGCNA identified the turquoise module as the most relevant gene module for DCM disease. Subsequently, machine learning algorithms identified 8 core genes while PPI screening identified 10 core genes. HSPA8 was found in both machine learning algorithms and PPI screening.

Conclusions: In this study, the HSPA8 gene was found to be a core gene in DCM, demonstrating the closest association with this disease. Further research on HSPA8 is expected to provide a target for the diagnosis and treatment of DCM.

Keywords: Dilated cardiomyopathy (DCM); HSPA8; bioinformatics

Submitted Jan 19, 2025. Accepted for publication May 21, 2025. Published online May 28, 2025.

doi: 10.21037/jtd-2025-132

Highlight box

Key findings

• This study aimed to identify reliable biomarker genes for dilated cardiomyopathy (DCM) through bioinformatics methods in order to overcome the challenges related to the clinical management of DCM.

What is known and what is new?

• There is lack of suitable diagnostic biomarkers and intervention targets of DCM in clinical practice.

• A reliable biomarker gene, HSPA8 for DCM were identified in this study.

What is the implication, and what should change now?

• The results of our study may offer new perspectives on the diagnosis and management of DCM.

• The specific mechanism of action of HSPA8 in relation to DCM still requires further laboratory investigation in future research.

Introduction

Dilated cardiomyopathy (DCM) is a primary myocardial disease characterized by cardiac dilation, arrhythmia, and impaired ventricular contractile function (1,2). The prognosis for patients with DCM is generally poor. The 5-year survival rate after the initial diagnosis is lower than 50%, and many patients eventually die from congestive heart failure in the advanced stages (3).

The implementation of early detection and early treatment is a reliable strategy for improving the quality of prognosis for patients with DCM; however, there are significant challenges to diagnosing DCM (4). The only gold standard for diagnosis is a cardiac biopsy, which is technically difficult to perform and cannot be widely implemented in many non-cardiac specialty hospitals. Therefore, actively developing diagnostic biomarkers for DCM is critical. As DCM is a disease closely linked to genetic factors, abnormalities in gene expression play a central role in its progression (5,6). By screening the specific genes related to DCM, strategies may be developed to address the diagnostic challenges of DCM.

Weighted gene coexpression network analysis (WGCNA) is a systems biology method used to describe the gene association patterns among different samples (7). It can identify highly coregulated gene sets and determine candidate biomarker genes or therapeutic targets based on the interconnectedness of gene sets and their associations with phenotypes. Machine learning, a subset of artificial intelligence, primarily guides computers to learn from data and improve their performance based on experience without explicit programming. In machine learning, algorithms continuously train to discover patterns and correlations from large datasets and then make optimal decisions and predictions based on data analysis results (8). Given the unique characteristics of both WGCNA and machine learning, we combined these two methods to screen for the genes expressed during the progression of DCM in order to provide reference data in overcoming the diagnostic obstacles associated with DCM. We present this article in accordance with the STREGA reporting checklist (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-132/rc).

Methods

Gene Expression Omnibus (GEO) datasets

The GEO is a database that provides gene expression data and related analysis tools and can be used to retrieve gene expression data from any species or cultured tissues and samples (9). Three datasets were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/), along with their expression profiles and sample information (GSE120895, GSE42955, and GSE3586). These three datasets were annotated using the following platforms: GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array, GPL6244 (HuGene-1_0-st) Affymetrix Human Gene 1.0 ST Array (transcript [gene] version), and GPL3050 Human Unigene3.1 cDNA Array 37.5K v. 1.0. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Screening of differentially expressed genes (DEGs)

The “limma” package in R software (The R Foundation for Statistical Computing) (10), was used for quality control and normalization of the data, which was followed by the differential analysis of gene expression levels. DEGs were obtained based on the selection criteria of an adjusted P<0.05 and |log₂ fold change (FC)| ≥ 1.5. The dataset of DEGs related to DCM was defined as DCM_DEG, and the data were visualized through volcano plots and heatmaps.

Weighted gene coexpression analysis

The “WGCNA” package in R was used to conduct WGCNA (11), with further filtering performed on the DCM_DEG gene dataset. Genes in the top 75% of median absolute deviation were selected for analysis. Outliers were removed, the soft threshold was determined based on the module’s scale independence and mean connectivity, and a scale-free topological network was then constructed, with a minimum gene count of 30 per module. An association analysis was conducted between modules and the occurrence of DCM, Pearson correlation coefficients was applied to calculate the relationship of modules with rheumatoid arthritis, and modules with the largest correlation coefficient and with P<0.05 were selected as the key modules.

Core gene selection based on machine learning algorithms

Least absolute shrinkage and selection operator (LASSO) logistic regression (12) and support vector machine (SVM) (13) were employed to further screen genes within the weighted coexpression gene modules. LASSO analysis was performed using the “glmnet” package in R, with the response type set to binary and the alpha to 1. A recursive feature elimination (RFE) method was applied to screen for the most suitable genes from the metadata cohort to avoid overfitting. Subsequently, SVM-RFE was used to identify the gene set with the highest recognition capability.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis

GO (http://www.geneontology.org/) and KEGG (https://www.kegg.jp/) enrichment analyses are powerful tools used to interpret the biological significance of large gene. GO (14) and KEGG enrichment analyses (15) were conducted using the “clusterProfiler” package and the “enrichplot” package in R.

Protein-protein interaction (PPI) network construction

The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) online database (https://cn.string-db.org/) was used for PPI network analysis (16), during which isolated points without interactions were removed to obtain the PPI network. This network was imported into Cytoscape software, where the MCODE plugin was used to identify the most strongly correlated subnetworks within the overall network, and the cytoHubba plugin was used to select the top 10 genes with the strongest correlation. Core genes were interactively selected based on the results from MCODE and cytoHubba.

Construction of a diagnostic model

The Cox proportional hazards regression model can calculate the predictive coefficients of multiple independent variables on the dependent variable (17). The λ condition was established for variables with nonzero coefficients, based on the minimum criteria. Using the following formula, the risk score was calculated: risk score = sum (expression level × corresponding coefficient). We used this model with DCM status as the dependent variable and core genes as independent variables to construct a diagnostic model for DCM.

Statistical analysis

Statistical significance was established at a P value of less than 0.05. After adjusting for multiple comparisons, P values less than 0.05 were considered statistically significant.

Results

DEGs associated with DCM

The heatmap displayed 2,057 DEGs associated with DCM from the GSE42955 dataset and the GSE3586 dataset (Figure 1A). The GSE42955 and GSE3586 datasets were merged to identify 2,057 DEGs associated with DCM (Figure 1B). The heatmap of 3,763 DCM-related DEGs was shown from the GSE120895 dataset (Figure 1C). A total of 3,763 DCM-related DEGs were screened from the GSE120895 dataset (Figure 1D). Subsequently, the two sets of DCM-related DEGs were combined to identify 1,979 common DEGs, which were defined as the DCM_DEG gene set (Figure 1E).

Figure 1 DEG screening. (A) Heatmap of DEGs from the GSE42955 and GSE3586 datasets. (B) Volcano plot of DEGs from the GSE42955 and GSE3586 datasets. (C) Heatmap of DEGs. (D) Volcano plot of DEGs from the GSE120895 dataset. (E) Venn diagram of intersecting DEGs from the GSE42955, GSE3586, and GSE120895 datasets. CO, control; DCM, dilated cardiomyopathy; DEG, differentially expressed genes; FC, fold change; NS, not significant.

Selection of genes related to DCM modules

This study employed weighted coexpression analysis (WGCNA) to further filter the DCM_DEG gene set, selecting parameter values ranging from 1 to 20 for network topology analysis. The results indicated that the optimal soft threshold was 12. Using dynamic tree cutting for segmentation and calculation, we ultimately obtained five gene modules containing different numbers of genes (Figure 2A). A correlation analysis was conducted between the five modules and the normal and DCM groups, resulting in a correlation heatmap of gene modules associated with clinical classifications (Figure 2B).

Figure 2 Weighted coexpression analysis. (A) Sample distribution. (B) Correlation between modules and phenotypes. (C) Correlation of DCM_DEG with the turquoise module. (D) Functional enrichment of DCM_DEG in the turquoise module. Darker colors indicate higher values of −log₁₀(P), representing a greater degree of enrichment. (E) DisGeNET analysis of DCM_DEG in the turquoise module. Darker colors indicate higher values of −log₁₀(P), representing a greater degree of enrichment. (F) PaGenBase enrichment analysis of DCM_DEG in the turquoise module. Darker colors indicate higher values of −log₁₀(P), representing a greater degree of enrichment. CO, control; DCM, dilated cardiomyopathy; DEG, differentially expressed genes.

Among the five modules, the turquoise module displayed the highest correlation with DCM (r=0.38; P<0.004). Consequently, subsequent analyses focused exclusively on the turquoise module (Figure 2C). GO functional enrichment analysis revealed that the genes in the turquoise module were significantly enriched in pathways related to the regulation of cellular catabolic processes, cellular responses to stimuli, neurodegenerative pathways associated with multiple diseases, and negative regulation of cellular component organization (Figure 2D). Analysis via the Database of Gene-Disease Associations (DisGeNET; http://www.disgenet.org/) indicated that the genes in the turquoise module were significantly enriched in heart-related diseases, including DCM, left ventricular hypertrophy, and hypertrophic cardiomyopathy (Figure 2E). Analysis via the Pattern Gene Database (PaGenBase; http://bioinf.xmu.edu.cn/PaGenBase/index.jsp) demonstrated that the genes in the turquoise module were significantly enriched in cardiac and adipose tissues (Figure 2F). These findings confirm that the genes in the turquoise module were significantly correlated with DCM.

Machine learning algorithm screening for the core genes in DCM

To further screen for the core genes from the turquoise module, we employed machine learning algorithms to construct two feature selection models. The LASSO regression model identified 52 DCM-related genes (Figure 3A), while the SVM-RFE algorithm identified 8 DCM-related genes (Figure 3B). The intersection of both results yielded 8 common genes associated with DCM, which can be considered to be core genes within the turquoise module (Figure 3C).

Figure 3 Results of the machine learning algorithm screening for core genes in DCM. (A) LASSO regression screening. (B) SVM-RFE screening. (C) Venn diagram of the intersecting genes. (D) Expression of signatures in the samples from DCM patients. (E) Functional enrichment analysis of the core genes. (F,G) Correlation among the core genes. CO, control; CV, coefficient of variation; DCM, dilated cardiomyopathy; LASSO, least absolute shrinkage and selection operator; SVM-RFE, support vector machine-recursive feature elimination.

Subsequently, we analyzed the expression of these core genes in the samples and their functions. The results indicated that the selected core genes exhibited differential expression between healthy individuals and patients with DCM (Figure 3D), and functional enrichment analysis via the GO database revealed that they were predominantly enriched in presynaptic regions and clathrin-coated vesicle membranes (Figure 3E). To validate the correlation among the core genes, we conducted Pearson and Spearman correlation analyses, ultimately observing significant correlations among the core genes across all samples (Figure 3F,3G).

DCM gene diagnosis model

In order to reduce the diagnostic difficulty of DCM, we first performed feature selection on the selected core genes (Figure 4A) and established a molecular diagnosis model for DCM using the Cox multiple linear regression model. The diagnostic model derived from the regression for the core genes is as follows: risk = (13.478459 × HSPA8) + (9.596549 × SGCG) + (–35.593685 × MYH2) + (–21.325514 × STK32B) + (–34.762538 × SLC29A1) + (39.615650 × SMARCA2) + (–37.673073 × SLC2A8) + (19.871895 × ARMCX2).

Figure 4 DCM gene diagnosis model. (A) Cox risk regression feature selection. (B) Dataset fitting curve in the model. (C) Fitting curves of each gene feature in the model. (D) Model accuracy curve in the validation set. AUC, area under curve; DCM, dilated cardiomyopathy.

Subsequently, we validated the established model. In the validation cohort, it succeeded completing validation within the dataset (Figure 4B). We also verified the fitting conditions of each feature in various validation tests (Figure 4C) and model achieved a diagnostic accuracy of 89.3% for DCM (Figure 4D).

Core genes most closely associated with DCM

To identify the core genes most closely associated with DCM, we constructed a PPI network for the turquoise module genes (Figure 5A), from which we selected 10 core genes (Figure 5B). Subsequently, a Venn analysis was performed between these 10 core genes and the 8 core genes identified by the machine learning algorithms, which yielded the HSPA8 gene. This gene may be the core gene with the closest association with DCM (Figure 5C).

Figure 5 Intersection of genes selected by the different methods. (A,B) PPI network of the turquoise module. (C) Intersection of hub genes selected from the PPI network and genes selected by machine learning algorithms. PPI, protein-protein interaction.

The diagnostic efficacy of HSPA8 for DCM

To verify the expression of HSPA8 in the samples, this study analyzed the expression levels of HSPA8 in the GSE42955, GSE3586, and GSE120895 datasets, finding that the expression level of HSPA8 in patients with DCM was significantly higher than that in the normal population (Figure 6A). Subsequently, based on five external datasets (GSE57338, GSE5406, GSE116250, GSE42955, and GSE19303), we employed receiver operating characteristic (ROC) curve analysis to evaluate the diagnostic efficacy of HSPA8 for DCM. This yielded an area under the ROC curve of 0.954, indicating that HSPA8 has extremely high diagnostic value for DCM (Figure 6B).

Figure 6 Evaluation of the diagnostic efficacy of HSPA8 for DCM. (A) Box plot of HSPA8 expression levels in the healthy control group and the DCM group. (B) ROC curve illustrating the diagnostic value of HSPA8 for DCM. AUC, area under curve; CO, control; DCM, dilated cardiomyopathy; ROC, receiver operating characteristic.

Discussion

In this study, differential gene analysis, WGCNA, and machine learning algorithms were used to screen for genes closely related to DCM. These genes showed significant changes in expression levels in patients with DCM and may potentially be the core factors driving disease progression. Thus, they can be considered as specific diagnostic biomarkers or therapeutic targets in DCM.

Many biomarkers are important in improving the diagnosis, prognosis, and treatment of DCM (18). Mutations in genes such as TTN, LMNA, and MYH7 have been identified as genetic biomarkers for familial DCM. These mutations are associated with disease progression and adverse outcomes, making them valuable for early diagnosis and risk stratification (18). For example, truncating variants in the TTN gene (TTNtv) are found in approximately 25% of familial DCM cases and are associated with worse clinical outcomes. Circulating biomarkers, such as natriuretic peptides, troponins, and inflammatory markers, have been widely used to assess disease severity and prognosis in DCM (18).

First, through WGCNA, we identified five modules, with the turquoise coexpression module showing the most significant correlation with the occurrence of DCM. Therefore, this module was selected as the critical module for DCM, the genes of which were subjected to GO functional enrichment, DisGeNET, and PaGenBase enrichment analyses. The GO functional enrichment analysis revealed that the genes in the turquoise module were significantly enriched in cells regulating metabolic processes and stress responses. The DisGeNET analysis indicated that these genes were significantly enriched in types of heart diseases such as cardiomyopathy and left ventricular hypertrophy. In the PaGenBase enrichment analysis, these genes were significantly enriched in heart and adipose tissues. These results provide strong evidence that the genes in the turquoise module are associated with DCM.

Subsequently, we further applied machine learning algorithms to construct LASSO regression and SVM-RFE models for further screening of the core genes in the turquoise module, ultimately identifying eight core genes: ARMCX2, HSPA8, SGCG, SMARCA2, MYH2, SLC2A8, STK32B, and SLC29A1. Most of these eight genes have been reported to be closely related to the occurrence of DCM.

For instance, ARMCX2 and MYH2 play significant regulatory roles in the formation of cytoskeletal proteins, and abnormal expression of cytoskeletal proteins is regarded as an important factor in the development of DCM (19,20). Clemen et al. reported that altered connexin structure affects the integrity of the cytoskeleton, subsequently damaging cardiomyocytes and eventually inducing DCM (21). A recent study identified a pathogenic MYL2 variant (D94A), characterized by an aspartic acid-to-alanine substitution at residue 94 of the regulatory light chain, which induces conformational alterations in myosin structure and drives DCM pathogenesis (22). SGCG is one of the components of the sarcoglycan complex, which may strengthen the connection between the cytoskeleton and cell membrane via the dystrophin axis; a lack of sarcoglycan can lead to membrane defects and cell death. Gene therapy utilizing recombinant adeno-associated virus (rAAV) vectors to deliver δ-sarcoglycan (δ-SG) restored cardiac diastolic function in a δ-SG-deficient DCM rat model, as evidenced by hemodynamic analysis (23,24). SMARCA2, an ATP-dependent chromatin remodeler, functions as a core component of the switch defective/sucrose non-fermentable (SWI-SNF) complex, which orchestrates nucleosome repositioning and transcriptional regulation (25). SMARCA2 utilizes ATP hydrolysis to drive structural rearrangements within the SWI/SNF complex, modulating transcription factor accessibility to DNA binding sites (26,27). The SWI-SNF complex has been confirmed to play a significant role in cardiac development, and its loss of function can cause various types of cardiomyopathy, including DCM (28). SLC2A8 and SLC29A1 are members of the solute carrier (SLC) superfamily, which comprises more than 400 transmembrane transporters (29). These transporters localize to both plasma membranes and intracellular organelles, regulating the flux of bioactive molecules across cellular compartments (30,31). Although research on the correlation between SLC2A8 and SLC29A1 and DCM is currently limited, available reports indicate that family members generally regulate cellular energy metabolism; thus, it cannot be ruled out that SLC2A8 and SLC29A1 may intervene in DCM development by affecting myocardial cell metabolism.

Furthermore, this study identified 10 hub genes from the turquoise module through PPI network screening. Subsequently, an intersection analysis was conducted between these 10 hub genes selected from the PPI network and 8 hub genes selected through machine learning, yielding HSPA8 as a common gene identified by both screening methods. This suggests that the HSPA8 gene may be more reliable as a core gene for DCM compared to other genes, and this was corroborated by ROC curve analysis, which showed that the area under curve (AUC) of HSPA8 was 0.954.

HSPA8 belongs to the HSP70 family of constitutively expressed homologous proteins and serves as a major chaperone protein in this family, comprising 1% of the total cellular protein content (32). Numerous studies have indicated a connection between HSPA8 and various cellular functions (32,33). In mammalian cytoplasm and nuclei, cochaperones regulate the binding and hydrolysis of adenosine triphosphate (ATP) by modulating HSPA8 activity, facilitating the cycling between ATP and adenosine diphosphate (34). Another primary cellular function of HSPA8 is its involvement in protein folding, in that it not only binds to newly synthesized unfolded proteins but also interacts with various protein aggregates to reduce their formation (35). Additionally, HSPA8 plays a crucial role in the process of cellular protein degradation, being a requisite for the ubiquitination of actin and crystallin (36,37). Moreover, abnormal HSPA8 expression has been associated with the age of onset of DCM (36-38).

Overall, in this study, we identified ARMCX2, HSPA8, SGCG, SMARCA2, MYH2, SLC2A8, STK32B, and SLC29A1 as the potential core genes of DCM. Among these eight genes, HSPA8 demonstrated the greatest significance, and its abnormal expression may have a critical impact on the development of DCM. However, the specific mechanism of action of HSPA8 in relation to DCM still requires further laboratory investigation, which will be addressed in our future research.

Conclusions

The HSPA8 gene was the core gene found to be most closely associated with DCM in this study. Further research on HSPA8 is expected to provide a target for the diagnosis and treatment of DCM.

Acknowledgments

We appreciate the unrestricted use of the TCGA database.

Footnote

Reporting Checklist: The authors have completed the STREGA reporting checklist. Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-132/rc

Peer Review File: Available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-132/prf

Funding: This research was supported by Joint Program on Health Science & Technology Innovation of Hainan Province (No. WSJK2024MS181), TCM Geriatrics Construction Project, Key Discipline of National Administration of TCM (No. 2023No3), The Second National Famous Traditional Chinese Medicine Inheritance Studio (Yang Hua Studio) ([2022] No. 245), Natural Science Foundation of Hainan Province (No. 822QN476), and Natural Science Foundation of Hainan Province (No. 824MS150).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jtd.amegroups.com/article/view/10.21037/jtd-2025-132/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Marrow BA, Cook SA, Prasad SK, et al. Emerging Techniques for Risk Stratification in Nonischemic Dilated Cardiomyopathy: JACC Review Topic of the Week. J Am Coll Cardiol 2020;75:1196-207. [Crossref] [PubMed]
Merlo M, Cannatà A, Gobbo M, et al. Evolving concepts in dilated cardiomyopathy. Eur J Heart Fail 2018;20:228-39. [Crossref] [PubMed]
Pinto YM, Elliott PM, Arbustini E, et al. Proposal for a revised definition of dilated cardiomyopathy, hypokinetic non-dilated cardiomyopathy, and its implications for clinical practice: a position statement of the ESC working group on myocardial and pericardial diseases. Eur Heart J 2016;37:1850-8. [Crossref] [PubMed]
Jefferies JL, Towbin JA. Dilated cardiomyopathy. Lancet 2010;375:752-62. [Crossref] [PubMed]
Zheng Y, Liu Z, Yang X, et al. Exploring Key Genes to Construct a Diagnosis Model of Dilated Cardiomyopathy. Front Cardiovasc Med 2022;9:865096. [Crossref] [PubMed]
Guo Q, Qu Q, Wang L, et al. Identification of Potential Biomarkers Associated with Dilated Cardiomyopathy by Weighted Gene Coexpression Network Analysis. Front Biosci (Landmark Ed) 2022;27:246. [Crossref] [PubMed]
Chen D, Chen X, Zheng X, et al. Combined metabolomic and transcriptomic analysis reveals the key genes for triterpenoid biosynthesis in Cyclocarya paliurus. BMC Genomics 2024;25:1197. [Crossref] [PubMed]
Haug CJ, Drazen JM. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. N Engl J Med 2023;388:1201-8. [Crossref] [PubMed]
Chicco D. geneExpressionFromGEO: An R Package to Facilitate Data Reading from Gene Expression Omnibus (GEO). Methods Mol Biol 2022;2401:187-94. [Crossref] [PubMed]
Liu S, Wang Z, Zhu R, et al. Three Differential Expression Analysis Methods for RNA Sequencing: limma, EdgeR, DESeq2. J Vis Exp 2021;
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008;9:559. [Crossref] [PubMed]
Fernández-Delgado M, Sirsat MS, Cernadas E, et al. An extensive experimental survey of regression methods. Neural Netw 2019;111:11-34. [Crossref] [PubMed]
Chen Q, Cao F. Distributed support vector machine in master-slave mode. Neural Netw 2018;101:94-100. [Crossref] [PubMed]
Wu T, Hu E, Xu S, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb) 2021;2:100141. [Crossref] [PubMed]
Deng Y, Xie Q, Zhang G, et al. Slow skeletal muscle troponin T, titin and myosin light chain 3 are candidate prognostic biomarkers for Ewing's sarcoma. Oncol Lett 2019;18:6431-42. [Crossref] [PubMed]
Athanasios A, Charalampos V, Vasileios T, et al. Protein-Protein Interaction (PPI) Network: Recent Advances in Drug Discovery. Curr Drug Metab 2017;18:5-10. [Crossref] [PubMed]
Ni A, Cai J. Tuning Parameter Selection in Cox Proportional Hazards Model with a Diverging Number of Parameters. Scand Stat Theory Appl 2018;45:557-70. [Crossref] [PubMed]
Parikh VN, Day SM, Lakdawala NK, et al. Advances in the study and treatment of genetic cardiomyopathies. Cell 2025;188:901-18. [Crossref] [PubMed]
Klaas M, Kangur T, Viil J, et al. The alterations in the extracellular matrix composition guide the repair of damaged liver tissue. Sci Rep 2016;6:27398. [Crossref] [PubMed]
Taylor MR, Slavov D, Ku L, et al. Prevalence of desmin mutations in dilated cardiomyopathy. Circulation 2007;115:1244-51. [Crossref] [PubMed]
Clemen CS, Stöckigt F, Strucksberg KH, et al. The toxic effect of R350P mutant desmin in striated muscle of man and mouse. Acta Neuropathol 2015;129:297-315. [Crossref] [PubMed]
Huang W, Liang J, Yuan CC, et al. Novel familial dilated cardiomyopathy mutation in MYL2 affects the structure and function of myosin regulatory light chain. FEBS J 2015;282:2379-93. [Crossref] [PubMed]
Kawada T, Nakazawa M, Toyo-Oka T. Somatic gene therapy of dilated cardiomyopathy. Nihon Yakurigaku Zasshi 2002;119:37-44. [Crossref] [PubMed]
Kawada T, Masui F, Tezuka A, et al. A novel scheme of dystrophin disruption for the progression of advanced heart failure. Biochim Biophys Acta 2005;1751:73-81. [Crossref] [PubMed]
Hargreaves DC, Crabtree GR. ATP-dependent chromatin remodeling: genetics, genomics and mechanisms. Cell Res 2011;21:396-420. [Crossref] [PubMed]
Mashtalir N, Dao HT, Sankar A, et al. Chromatin landscape signals differentially dictate the activities of mSWI/SNF family complexes. Science 2021;373:306-15. [Crossref] [PubMed]
Hargreaves DC. Chromatin openness requires continuous SWI/SNF activity. Nat Genet 2021;53:263-4. [Crossref] [PubMed]
Bevilacqua A, Willis MS, Bultman SJ. SWI/SNF chromatin-remodeling complexes in cardiovascular development and disease. Cardiovasc Pathol 2014;23:85-91. [Crossref] [PubMed]
Hediger MA, Clémençon B, Burrier RE, et al. The ABCs of membrane transporters in health and disease (SLC series): introduction. Mol Aspects Med 2013;34:95-107. [Crossref] [PubMed]
Perland E, Fredriksson R. Classification Systems of Secondary Active Transporters. Trends Pharmacol Sci 2017;38:305-15. [Crossref] [PubMed]
Meixner E, Goldmann U, Sedlyarov V, et al. A substrate-based ontology for human solute carriers. Mol Syst Biol 2020;16:e9652. [Crossref] [PubMed]
Kampinga HH, Hageman J, Vos MJ, et al. Guidelines for the nomenclature of the human heat shock proteins. Cell Stress Chaperones 2009;14:105-11. [Crossref] [PubMed]
Stricher F, Macri C, Ruff M, et al. HSPA8/HSC70 chaperone protein: structure, function, and chemical targeting. Autophagy 2013;9:1937-54. [Crossref] [PubMed]
Lüders J, Demand J, Papp O, et al. Distinct isoforms of the cofactor BAG-1 differentially affect Hsc70 chaperone function. J Biol Chem 2000;275:14817-23. [Crossref] [PubMed]
Bonam SR, Ruff M, Muller S. HSPA8/HSC70 in Immune Disorders: A Molecular Rheostat that Adjusts Chaperone-Mediated Autophagy Substrates. Cells 2019;8:849. [Crossref] [PubMed]
Chen BH, Chang YJ, Lin S, et al. Hsc70/Stub1 promotes the removal of individual oxidatively stressed peroxisomes. Nat Commun 2020;11:5267. [Crossref] [PubMed]
Zhang H, Amick J, Chakravarti R, et al. A bipartite interaction between Hsp70 and CHIP regulates ubiquitination of chaperoned client proteins. Structure 2015;23:472-82. [Crossref] [PubMed]
Portig I, Pankuweit S, Maisch B. Antibodies against stress proteins in sera of patients with dilated cardiomyopathy. J Mol Cell Cardiol 1997;29:2245-51. [Crossref] [PubMed]

Cite this article as: Liu Y, Wu Q, Mo F, Mai Y, Zhang S, Wu X. Bioinformatics-based screening of genes associated with dilated cardiomyopathy. J Thorac Dis 2025;17(5):3357-3369. doi: 10.21037/jtd-2025-132

Bioinformatics-based screening of genes associated with dilated cardiomyopathy

Highlight box

Introduction

Methods

Gene Expression Omnibus (GEO) datasets

Screening of differentially expressed genes (DEGs)

Weighted gene coexpression analysis

Core gene selection based on machine learning algorithms

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis

Protein-protein interaction (PPI) network construction

Construction of a diagnostic model

Statistical analysis

Results

DEGs associated with DCM

Selection of genes related to DCM modules

Machine learning algorithm screening for the core genes in DCM

DCM gene diagnosis model

Core genes most closely associated with DCM

The diagnostic efficacy of HSPA8 for DCM

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share