| Bioinformatics Toolbox | ![]() |
Sections of a DNA sequence with a high percent of A+T nucleotides usually indicates intergenic parts of the sequence, while low A+T and higher G+C nucleotide percentages indicate possible genes. Many times high CG dinucleotide content is located before a gene.
After you read a sequence into MATLAB, you can use the sequence statistics functions to determine if your sequence has the characteristics of a protein-coding region. This procedure uses the human mitochondrial genome as an example. See Getting Sequence Information into MATLAB.
Plot monomer densities and combined monomer densities in a graph. In the MATLAB Command window, type
ntdensity(mitochondria)
This graph shows that the genome is A+T rich.

Count the nucleotides using the function basecount.
basecount(mitochondria)
A list of nucleotide counts is shown for the 5'-3' strand.
ans =
A: 5113
C: 5192
G: 2180
T: 4086
Count the nucleotides in the reverse complement of a sequence using the function seqrcomplement.
basecount(seqrcomplement(mitochondria))
As expected, the nucleotide counts on the reverse complement strand are complementary to the 5'-3' strand.
ans =
A: 4086
C: 2180
G: 5192
T: 5113
Use the function basecount with the chart option to visualize the nucleotide distribution.
basecount(mitochondria,'chart','pie');
MATLAB draws a pie chart in a figure window.

Count the dimers in a sequence and display the information in a bar chart.
dimercount(mitochondria,'chart','bar')
MATLAB lists the dimer counts and draws a bar chart.

| Getting Sequence Information into MATLAB | Determining Codon Composition | ![]() |
© 1994-2005 The MathWorks, Inc.