A novel feature selection approach for data integration analysis: applications to transcriptomics study

Puthiyedth, Nisha

Title: A novel feature selection approach for data integration analysis: applications to transcriptomics study
Creator: Puthiyedth, Nisha
Relation: University of Newcastle Research Higher Degree Thesis
Resource Type: thesis
Date: 2016
Description: Research Doctorate - Doctor of Philosophy (PhD)
Description: Meta-analysis has become a popular method for identifying novel biomarkers in the field of medical research. Meta-analysis has been widely applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. Joint analysis of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers reported in smaller studies. The approach generally followed relies on the fact that as the total number of samples increases, greater power to detect associations of interest is anticipated. Integrating available information from different datasets to generate a combined result seems reasonable and promising. Consequently, there is a need for computationally based integration methods that evaluate multiple independent datasets investigating a common theme or disorder. This raises a variety of issues in the analysis of such data and leads to more complications than are seen with standard meta-analysis, including diverse experimental platforms and complex data structures. I illustrate these ideas using microarray datasets from multiple studies and propose an integrative methodology to combine datasets generated using different platforms. Having combined the data, the main challenge is to choose a subset of features that represent the combined dataset in a particular aspect. While the approach is well established in biostatistics, the introduction of new combinatorial optimisation models to address this issue has not been explored in depth. In 2004, a new feature selection approach based on a combinatorial optimisation method was proposed, entitled the (α,β)-k Feature Set problem approach. The main advantage of this approach over ranking methods for selecting individual features is that the features are evaluated as groups instead of on the basis of their individual performance. The (α,β)-k Feature Set problem approach has been defined having first in mind a single uniform dataset, and conceived in this ways, it is not readily applicable to the case of integrated datasets. An extended version of this approach handles integrated datasets in a consistent manner and selects features that differentiate sample pairs across datasets. The application of an (α,β)-k Feature Set problem -based approach for meta-analysis thus helps to identify the best set of features from a combined dataset, allowing researchers to reveal the genetic pathways that contribute to the development of a disease. I propose an extended version of the (α,β)-k Feature Set problem approach that aims to find a set of genes whose expression level may be used to identify a joint core subset of genes that putatively play an important role in two conditions: prostate cancer and Alzheimer's disease. The results of the current study suggest that the proposed method is an efficient meta-analysis method that is capable of identifying biologically relevant genes that other methods fail to identify. As the amount of data increases, this novel method can be applied to find additional genes and pathways that are significant in these diseases, which may provide new insights into the disease mechanism and contribute towards understanding, prevention and cures.
Subject: meta-analysis; biomarkers; combinatorial optimisation; prostate cancer; Alzheimer's disease
Identifier: http://hdl.handle.net/1959.13/1322449
Identifier: uon:24585
Language: eng
Full Text

Hits: 1405
Visitors: 2074
Downloads: 750

		Thumbnail	File	Description	Size	Format
View Details Download			ATTACHMENT01	Thesis	14 MB	Adobe Acrobat PDF	View Details Download
View Details Download			ATTACHMENT02	Abstract	256 KB	Adobe Acrobat PDF	View Details Download