ASME Press Select Proceedings

Intelligent Engineering Systems through Artificial Neural Networks, Volume 20

Cihan H. Dagli
No. of Pages:
ASME Press
Publication date:

This paper describes an ongoing research effort to identify gene sets that predict the survival of colorectal cancer patients based on gene expression data. Since the dataset includes 395 genes (after initial feature reduction) and 122 patients, the issue of over fitting must be addressed. A genetic algorithm (GA) specifically designed for feature set selection is used in combination with a support vector machine (SVM). By evaluating groups of genes as opposed to individual genes, complementary sets are obtained. To combat over fitting, the original measurements are perturbed by noise using variances appropriate to each measurement and an overall gain that is adjusted until only a “modest” number of gene sets are repeatedly discovered. Through these adjustments we seek the strongest signal in the data set. The goal is the discovery of clinically useful diagnostic patterns or the rejection of a data set if the strongest signal is not biologically relevant. Initial simulations have shown signs of reproducibility, consistency, and relevance of identified (individual) genes.

