- Title
- Mining disjunctive patterns in biomedical data sets
- Creator
- Vimieiro, Renato
- Relation
- University of Newcastle Research Higher Degree Thesis
- Resource Type
- thesis
- Date
- 2012
- Description
- Research Doctorate - Doctor of Philosophy (PhD)
- Description
- Frequent itemset mining is one of the most studied problems in data mining. Since Agrawal et al. (1993) introduced the problem, several advances both theoretical and practical have been achieved. In spite of that, there are still many unresolved issues to be tackled before frequent pattern mining can be claimed a cornerstone approach in data mining (Han et al., 2007). Here, we investigate issues related to: (1) the (un)suitability of frequent itemset mining algorithms to identify patterns in biomedical data sets; and (2) the limited expressiveness of such patterns, since, in its vast majority, frequent itemsets are exclusively conjunctions. Our ultimate goal in this thesis is to improve methods for frequent pattern mining in such a way that they provide alternative insightful solutions for mining biomedical data sets. Specifically, we provide eficient tools for mining disjunctive patterns in biomedical data sets. We tackle the problem of mining disjunctive patterns through three different fronts: (1) disjunctive minimal generators; (2) disjunctive closed patterns; and (3) quasi-CNF emerging patterns. We then propose three different algorithms, one for each task above: TitanicOR, Disclosed, and QCEP. While the first two aim for more descriptive patterns, the third is a more predictive. These algorithms are proposed as an attempt to cover different sources of data sets coming from biomedical researches. TitanicOR is more suitable to identify patterns in data sets containing physiological, biochemical, or medical record information. Disclosed was designed to exploit the characteristics of microarray gene expression data sets, which usually contains many features, but only few samples. Finally, QCEP is the only algorithm to consider data sets with class label information. We conducted experiments with both synthetic and real world data sets to assess the performance of our algorithms. Our experiments show that our algorithms overcame the state of the art algorithms in each of those categories of patterns.
- Subject
- associative classification; frequent pattern mining; data mining; disjunction; disjunctive patterns; closed patterns
- Identifier
- http://hdl.handle.net/1959.13/936341
- Identifier
- uon:12280
- Rights
- Copyright 2012 Renato Vimieiro
- Language
- eng
- Full Text
- Hits: 12532
- Visitors: 2245
- Downloads: 649
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT01 | Abstract | 172 KB | Adobe Acrobat PDF | View Details Download | ||
View Details Download | ATTACHMENT02 | Thesis | 1 MB | Adobe Acrobat PDF | View Details Download |