| Bioinformatics Toolbox | ![]() |
Principal-component analysis(PCA) is a useful technique you can use to reduce the dimensionality of large data sets, such as those from microarray analysis. PCA can also be used to find signals in noisy data.
You can use the The function princomp in the Statistics Toolbox to calculate the principal components of a data set.
[pc, zscores, pcvars] = princomp(yeastvalues)
MATLAB displays
pc =
Columns 1 through 4
-0.0245 -0.3033 -0.1710 -0.2831
0.0186 -0.5309 -0.3843 -0.5419
0.0713 -0.1970 0.2493 0.4042
0.2254 -0.2941 0.1667 0.1705
0.2950 -0.6422 0.1415 0.3358
0.6596 0.1788 0.5155 -0.5032
0.6490 0.2377 -0.6689 0.2601
Columns 5 through 7
-0.1155 0.4034 0.7887
-0.2384 -0.2903 -0.3679
-0.7452 -0.3657 0.2035
-0.2385 0.7520 -0.4283
0.5592 -0.2110 0.1032
-0.0194 -0.0961 0.0667
-0.0673 -0.0039 0.0521
You can use the function cumsum to see the cumulative sum of the variances.
cumsum(pcvars./sum(pcvars) * 100)
MATLAB displays
ans = 78.3719 89.2140 93.4357 96.0831 98.3283 99.3203 100.0000
This shows that almost 90% of the variance is accounted for by the first two principal components.
A scatter plot of the scores of the first two principal components shows that there are two distinct regions. This is not unexpected, because the filtering process removed many of the genes with low variance or low information. These genes would have appeared in the middle of the scatter plot.
figure
scatter(zscores(:,1),zscores(:,2));
xlabel('First Principal Component');
ylabel('Second Principal Component');
title('Principal Component Scatter Plot');
MATLAB plots the figure.

The function gname from the Statistics Toolbox can be used to identify genes on a scatter plot. You can select as many points as you like on the scatter plot.
gname(genes);
When you have finished selecting points, press Enter.
An alternative way to create a scatter plot is with the function gscatter from the Statistics Toolbox. gscatter creates a grouped scatter plot where points from each group have a different color or marker. You can use clusterdata, or any other clustering function, to group the points.
figure
pcclusters = clusterdata(zscores(:,1:2),6);
gscatter(zscores(:,1),zscores(:,2),pcclusters)
xlabel('First Principal Component');
ylabel('Second Principal Component');
title('Principal Component Scatter Plot with Colored Clusters');
gname(genes) % Press enter when you finish selecting genes.
MATLAB plots the figure.

| Clustering Genes | Phylogenetic Analysis | ![]() |
© 1994-2005 The MathWorks, Inc.