Clustering Genes

Now that you have a manageable list of genes, you can look for relationships between the profiles using some different clustering techniques from the Statistics Toolbox.

  1. For hierarchical clustering, the function pdist calculates the pairwise distances between profiles, and the function linkage creates the hierarchical cluster tree.

    corrDist = pdist(yeastvalues, 'corr');
    clusterTree = linkage(corrDist, 'average');
    
  2. The function cluster calculates the clusters based on either a cutoff distance or a maximum number of clusters. In this case, the 'maxclust' option is used to identify 16 distinct clusters.

    clusters = cluster(clusterTree, 'maxclust', 16);
    
  3. The profiles of the genes in these clusters can be plotted together using a simple loop and the function subplot.

    figure
    for c = 1:16
        subplot(4,4,c);
        plot(times,yeastvalues((clusters == c),:)');
        axis tight
    end
    suptitle('Hierarchical Clustering of Profiles');
    

    MATLAB plots the images.

  4. The Statistics Toolbox also has a K-means clustering function. Again, sixteen clusters are found, but because the algorithm is different these are not necessarily the same clusters as those found by hierarchical clustering.

    [cidx, ctrs] = kmeans(yeastvalues, 16, 
                          'dist','corr',
                          'rep',5,
                          'disp','final');
    figure
    for c = 1:16
        subplot(4,4,c);
        plot(times,yeastvalues((cidx == c),:)');
        axis tight
    end
    suptitle('K-Means Clustering of Profiles');
    

    MATLAB displays

    13 iterations, total sum of distances = 11.4042
    14 iterations, total sum of distances = 8.62674
    26 iterations, total sum of distances = 8.86066
    22 iterations, total sum of distances = 9.77676
    26 iterations, total sum of distances = 9.01035
    

  5. Instead of plotting all of the profiles, you can plot just the centroids.

    figure
    for c = 1:16
        subplot(4,4,c);
        plot(times,ctrs(c,:)');
        axis tight
        axis off    % turn off the axis
    end
    suptitle('K-Means Clustering of Profiles');
    

    MATLAB plots the figure.

  6. You can use the function clustergram to create a heat map and dendrogram from the output of the hierarchical clustering.

    figure
    clustergram(yeastvalues(:,2:end),'RowLabels',genes,...
                                     'ColumnLabels',times(2:end))
    

    MATLAB plots the figure.


© 1994-2005 The MathWorks, Inc.