Scatter Plots of Microarray Data

There are two columns in the microarray data structure labeled 'F635 Median - B635' and 'F532 Median - B532'. These columns are the differences between the median foreground and the median background for the 635 nm channel and 532 nm channel respectively. These give a measure of the actual expression levels, although since the data must first be normalized to remove spatial bias in the background, you should be careful about using these values without further normalization. However, in this example no normalization is performed.

  1. Rather than working with data in a larger structure, it is often easier to extract the column numbers and data into separate variables.

    cy5DataCol = find(strcmp(wt.ColumnNames,'F635 Median - B635'))
    cy3DataCol = find(strcmp(wt.ColumnNames,'F532 Median - B532'))
    cy5Data = pd.Data(:,cy5DataCol);
    cy3Data = pd.Data(:,cy3DataCol);
    

    MATLAB displays

    cy5DataCol =
        34
    
    cy3DataCol =
        35
    
  2. A simple way to compare the two channels is with a loglog plot. The function maloglog is used to do this. Points that are above the diagonal in this plot correspond to genes that have higher expression levels in the A1 voxel than in the brain as a whole.

    figure
    maloglog(cy5Data,cy3Data)
    xlabel('F635 Median - B635 (Control)');
    ylabel('F532 Median - B532 (Voxel A1)');
    

    MATLAB displays the following messages and plots the images.

    Warning: Zero values are ignored
    (Type "warning off Bioinfo:MaloglogZeroValues" to suppress
     this warning.)
    Warning: Negative values are ignored.
    (Type "warning off Bioinfo:MaloglogNegativeValues" to suppress
     this warning.)
    

    Notice that this function gives some warnings about negative and zero elements. This is because some of the values in the 'F635 Median - B635' and 'F532 Median - B532' columns are zero or even less than zero. Spots where this happened might be bad spots or spots that failed to hybridize. Points with positive, but very small, differences between foreground and background should also be considered to be bad spots.

  3. Disable the display of warnings by using the warning command. Although warnings can be distracting, it is good practice to investigate why the warnings occurred rather than simply to ignore them. There might be some systematic reason why they are bad.

    warnState = warning;          % First save the current warning
                                    state.
                                  % Now turn off the two warnings.
    warning('off','Bioinfo:MaloglogZeroValues');
    warning('off','Bioinfo:MaloglogNegativeValues');
    figure
    maloglog(cy5Data,cy3Data)      % Create the loglog plot
    warning(warnState);            % Reset the warning state.
    xlabel('F635 Median - B635 (Control)');
    ylabel('F532 Median - B532 (Voxel A1)');
    

    MATLAB plots the image.

  4. An alternative to simply ignoring or disabling the warnings is to remove the bad spots from the data set. You can do this by finding points where either the red or green channel has values less than or equal to a threshold value. For example, use a threshold value of 10.

    threshold = 10;
    badPoints = (cy5Data <= threshold) | (cy3Data <= threshold);
    

    MATLAB plots the image.

  5. You can then remove these points and redraw the loglog plot.

    cy5Data(badPoints) = []; cy3Data(badPoints) = [];
    figure
    maloglog(cy5Data,cy3Data)
    xlabel('F635 Median - B635 (Control)');
    ylabel('F532 Median - B532 (Voxel A1)');
    
    

    MATLAB plots the image.

    This plot shows the distribution of points but does not give any indication about which genes correspond to which points.

  6. Add gene labels to the plot. Because some of the data points have been removed, the corresponding gene IDs must also be removed from the data set before you can use them. The simplest way to do that is wt.IDs(~badPoints).

    maloglog(cy5Data,cy3Data,'labels',wt.IDs(~badPoints),
             'factorlines',2)
    xlabel('F635 Median - B635 (Control)');
    ylabel('F532 Median - B532 (Voxel A1)');
    

    MATLAB plots the image.

  7. Try using the mouse to click some of the outlier points.

    You will see the gene ID associated with the point. Most of the outliers are below the y = x line. In fact, most of the points are below this line. Ideally the points should be evenly distributed on either side of this line.

  8. Normalize the points to evenly distribute them on either side of the line. Use the function mameannorm to perform global mean normalization.

    normcy5 = mameannorm(cy5Data);
    normcy3 = mameannorm(cy3Data);
    

    If you plot the normalized data you will see that the points are more evenly distributed about the y = x line.

    figure
    maloglog(normcy5,normcy3,'labels',wt.IDs(~badPoints),
             'factorlines',2)
    xlabel('F635 Median - B635 (Control)');
    ylabel('F532 Median - B532 (Voxel A1)');
    

    MATLAB plots the image.

  9. The function mairplot is used to create an Intensity vs. Ratio plot for the normalized data. This function works in the same way as the function maloglog.

    figure
    mairplot(normcy5,normcy3,'labels',wt.IDs(~badPoints),
             'factorlines',2)
    

    MATLAB plots the image.

  10. You can click the points in this plot to see the name of the gene associated with the plot.


© 1994-2005 The MathWorks, Inc.