In this chapter we present a method based on the (α,β)-k-feature set problem for identifying relevant attributes in high-dimensional datasets for classification purposes. We present a case-study of biomedical interest. Using the gene expression of thousands of genes, we show that the method can give a reduced set that can identify samples as belonging to prostate cancer tumors or not. We thus address the need of finding novel methods that can deal with classification problems that involve feature selection from several thousand features, while we only have on the order of one hundred samples. The methodology appears to be very robust in this prostate cancer case study. It has lead to the identification of a set of differentially expressed genes that are highly predictive of the cells transition to a more malignant type, thus departing from the profile which is characteristic of its originating tissue. Although the method is presented with a particular bioinformatics application in mind, it can clearly be used in other domains. A biological analysis illustrates on the relevance of the genes found, and links to the most current developments in prostate cancer biomarker studies.
Foundations of Computational Intelligence Volume 5: Functional Approximation and Classification p. 149-175