Determining Nucleotide Composition

Sections of a DNA sequence with a high percent of A+T nucleotides usually indicates intergenic parts of the sequence, while low A+T and higher G+C nucleotide percentages indicate possible genes. Many times high CG dinucleotide content is located before a gene.

After you read a sequence into MATLAB, you can use the sequence statistics functions to determine if your sequence has the characteristics of a protein-coding region. This procedure uses the human mitochondrial genome as an example. See Getting Sequence Information into MATLAB.

  1. Plot monomer densities and combined monomer densities in a graph. In the MATLAB Command window, type

    ntdensity(mitochondria)
    

    This graph shows that the genome is A+T rich.

  2. Count the nucleotides using the function basecount.

    basecount(mitochondria)
    

    A list of nucleotide counts is shown for the 5'-3' strand.

    ans = 
        A: 5113
        C: 5192
        G: 2180
        T: 4086
    
  3. Count the nucleotides in the reverse complement of a sequence using the function seqrcomplement.

    basecount(seqrcomplement(mitochondria))
    

    As expected, the nucleotide counts on the reverse complement strand are complementary to the 5'-3' strand.

    ans = 
        A: 4086
        C: 2180
        G: 5192
        T: 5113
    
  4. Use the function basecount with the chart option to visualize the nucleotide distribution.

    basecount(mitochondria,'chart','pie');
    

    MATLAB draws a pie chart in a figure window.

  5. Count the dimers in a sequence and display the information in a bar chart.

    dimercount(mitochondria,'chart','bar')
    

    MATLAB lists the dimer counts and draws a bar chart.


© 1994-2005 The MathWorks, Inc.