Cluster Analysis for Identifying Genes Highly Correlated with a Phenotype
Jhoirene Clemente , Jan Michael Yap and Henry Adorna
Jhoirene Clemente . Cluster Analysis for Identifying Genes Highly Correlated with a Phenotype. (Under the direction of Jan Michael Yap) In this research, we perform cluster analysis of gene expression proﬁles extracted from 33 young breast cancer patients who developed distant metastasis in less than ﬁve years. The analysis aims to compare the cluster results made by performing Pearson’s Correlation, which partitions the set of gene transcripts into: a) directly affecting; b) independently affecting; or c) inversely affccting, on our trait of interest, and the cluster results made using our algorithm of choice which is the standard K Means algorithm, while taking into account the different distance measures (i.e. Euclidean, Squared Euclidean and Manhattan). The analysis includes cluster validation using visualization through vector fusion, and the Adjusted Rand Index to compare the result of the clustering made using K Means to the one made using Pearson’s Correlation. The Adjusted Rand Index showed that there is a low level of agreement between the two cluster results and, therefore, K Means Clustering is not a valid method to use instead of Pearson’s Correlation in identifying signiﬁcantly correlated gene transcripts, but the analysis showed that there is a signiﬁcant clustering of the identiﬁed signiﬁcantly correlated genes using K Means Clustering without the phenotypic value of the samples by comparing it to the result of random clustering which served as the null model.