48 Biomolecular Feature Selection of Colorectal Cancer Microarray Data Using GA-SVM Hybrid
-
Published:2009
Download citation file:
In 2008, there were over 100,000 newly reported cases of colon cancer, and 40,000 cases of rectal cancer in the United States. In order to minimize the number of deaths from these diseases, researchers have been striving to find a set of genes that can accurately characterize the correct prognosis for colorectal cancer. Working with a gene expression microarray dataset of about 55,000 genes, collected from 122 colorectal cancer patients, this research developed technology to identify an optimal set of features through several methods of feature selection. These methods included coarse feature reduction, fine feature selection, and classification using a Genetic Algorithm / Support Vector Machine (GA/SVM) hybrid. However, microarray data with dimensions such as these are feature-rich and case-poor, which can lead to dangers of overfitting. This research was successful in developing a feature reduction method that was able to suggest a set of genes with potential ties to colorectal cancer, provoking further investigation into this relationship.