Abstract:
Background: Classification of biological samples of gene expression data is a basic building block in solving several
problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan.
One big challenge in sample classification is handling large dimensional and redundant gene expression data. To
reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role.
Results: The current paper explores the use of biological knowledge acquired from Gene Ontology database in
selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature
selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene
selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the
reduced gene space.
Conclusions: Reported results show that consideration of biological knowledge in gene selection technique not
only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification.
The obtained reduced gene space is validated using strong biological significance tests. In order to prove the
supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has
also been performed with state-of-the-art techniques.