Background: With the wide-spreading in high-throughput technologies and the arrival of omics era, new bioinformatics tools are available for the integration of different data types. Epidemiological and translational integrative research needs to combine newly generated omics data with classical risk and prognostic/predictive factors ...»»»»
Background: With the wide-spreading in high-throughput technologies and the arrival of omics era, new bioinformatics tools are available for the integration of different data types. Epidemiological and translational integrative research needs to combine newly generated omics data with classical risk and prognostic/predictive factors to increase the accuracy of their models/algorithms. This need adds an additional layer of complexity in the field of integromics that has barely been addressed. Here, I assessed the performance of some available integrative methods, such as LASSO, Elastic Net (ENET), Integrative clustering by iClusterPlus, and Neural Networks for the integration of omics and non-omics variables in the same model.
Methods: An in-depth bibliographic search was conducted to identify, learn about, characterize, and select data integrative methods. Then, using the data generated by the PanGenEU Study on pancreatic cancer, I created two different datasets composed both by omics (genomic and epigenomics) variables and epidemiological variables. Finally, I have applied the integrative methods to both datasets.
Results: ENET and IClusterPlus had reported both omics and non-omics as feature selection, being diabetes the only epidemiological variable selected for both methods. LASSO and ENET have selected common omics variables that overlaps with previous studies performed in pancreatic cancer research. Neural Network could not be used with both datasets but it was applied to ENET’s feature selection, obtaining a high level of accuracy.
Conclusions: All the integrative methods have presented several advantages and difficulties in both omics and non-omics integration. I propose Elastic Net as the best method applied, due to feature selection reported and low run time and computational requirements.^^^^