Locating Protein Coding Sequences

A nucleotide sequence includes regulatory sequences before and after the protein coding section. By analyzing this sequence, you can determine the nucleotides that code for the amino acids in the final protein.

After you have a list of genes you are interested in studying, you can determine the protein coding sequences. This procedure uses the human gene HEXA and mouse gene HEXA as an example.

  1. If you did not retrieve gene data from the Web, you can load example data from a MAT-file included with the Bioinformatics Toolbox. In the MATLAB Command window, type

    load hexosaminidase
    

    MATLAB loads the structures humanHEXA and mouseHEXA into the MATLAB workspace.

  2. Look for open reading frames in the human gene. For example, for the human gene HEXA, type

    humanORFs=seqshoworfs(humanHEXA.Sequence)
    

    seqshoworfs creates the output structure humanORFs. This structure gives the position of the start and stop codons for all open reading frames (ORFs) on each reading frame.

    humanORFs = 
    
    1x3 struct array with fields:
        Start
        Stop
    

    The Help browser opens with a listing for the three reading frames with the ORFs colored blue, red, and green. Notice that the longest ORF is on the third reading frame.

  3. Locate open reading frames (ORFs) on the mouse gene. Type

    mouseORFs = seqshoworfs(mouseHEXA.Sequence)
    

    seqshoworfs creates the structure mouseORFS.

    mouseORFs = 
    
    1x3 struct array with fields:
        Start
        Stop
    

    The mouse gene shows the longest ORF on the first reading frame.


© 1994-2005 The MathWorks, Inc.