Comparing Amino Acid Sequences

You could use alignment functions to look for similarities between two nucleotide sequences, but alignment functions return more biologically meaningful results when you are using amino acid sequences.

After you have located the open reading frames on your nucleotide sequences, you can convert the protein coding sections of the nucleotide sequences to their corresponding amino acid sequences, and then you can compare them for similarities.

  1. Using the identified open reading frames, convert the DNA sequence to the amino acid sequences. Type

    mouseProtein = nt2aa(mouseHEXA.Sequence)
    

    Remember that the human HEXA gene was on the third reading frame, so you need to indicate which frame to use.

    humanProtein = nt2aa(humanHEXA.Sequence,'frame',3)
    
  2. Draw a dot plot comparing the human and mouse amino acid sequences. Type

    seqdotplot(mouseProtein,humanProtein,4,3)
    ylabel('Mouse hexosaminidase A (alpha subunit)')
    xlabel('Human hexosaminidase A (alpha subunit)')
    

    Dot plots are one of the easiest ways to look for similarity between sequences. The diagonal line shown below indicates that there may be a good alignment between the two sequences.

  3. Globally align the two amino acid sequences, using the Needleman-Wunsch algorithm. Type

    [GlobalScore, GlobalAlignment = nwalign(humanProtein,
                                            mouseProtein)
    showalignment(GlobalAlignment)
    

    showalignment displays the global alignment of the two sequences in the Help browser. Notice that the calculated identity between the two sequences is 64.5 %.

    The alignment is very good for the first 550 nucleotides, after which the two sequences appear to be unrelated. Notice that there is a stop (*) in the sequence at this point. If you shorten the sequence to include only the amino acids that are in the protein (after the first methionine and before the first stop) you might get a better alignment.

  4. Trim the sequence from the first start amino acid (usually M) to the first stop (first *) and then try alignment again. Find the indices for the stops in the sequences.

    humanStops = find(humanProtein == '*')
    
    humanStops =
       538   550   652   661   669
    
    
    mouseStops = find(mouseProtein =='*')
    
    mouseStops =
    
       539   557   574   606
    

    Looking at the amino acid sequence for humanProtein, the first M is at position 9, while the first M for the mouse protein is at 11.

  5. Truncate the sequence to include only amino acids in the protein and the stop.

    humanProteinORF = humanProtein(9:humanStops(1));
    
    humanProteinORF =
    MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDV
    SSAAQPGCSVLDEAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVV
    TPGCNQLPTLESVENYTLTINDDQCLLLSETVWGALRGLETFSQLVWKSA
    EGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLSSILDTLDVMAYNKLNV
    FHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEYARLRG
    IRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEF
    MSTFFLEVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQ
    LESFYIQTLLDIVSSYGKGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNY
    MKELELVTKAGFRALLSAPWYLNRISYGPDWKDFYVVEPLAFEGTPEQKA
    LVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKLTSDLTFAYERL
    SHFRCELLRRGVQAQPLNVGFCEQEFEQT*
    
    
    mouseProteinORF = mouseProtein(11:mouseStops(1))
    
    mouseProteinORF =
    MAGCRLWVSLLLAAALACLATALWPWPQYIQTYHRRYTLYPNNFQFRYHV
    SSAAQAGCVVLDEAFRRYRNLLFGSGSWPRPSFSNKQQTLGKNILVVSVV
    TAECNEFPNLESVENYTLTINDDQCLLASETVWGALRGLETFSQLVWKSA
    EGTFFINKTKIKDFPRFPHRGVLLDTSRHYLPLSSILDTLDVMAYNKFNV
    FHWHLVDDSSFPYESFTFPELTRKGSFNPVTHIYTAQDVKEVIEYARLRG
    IRVLAEFDTPGHTLSWGPGAPGLLTPCYSGSHLSGTFGPVNPSLNSTYDF
    MSTLFLEISSVFPDFYLHLGGDEVDFTCWKSNPNIQAFMKKKGFTDFKQL
    ESFYIQTLLDIVSDYDKGYVVWQEVFDNKVKVRPDTIIQVWREEMPVEYM
    LEMQDITRAGFRALLSAPWYLNRVKYGPDWKDMYKVEPLAFHGTPEQKAL
    VIGGEACMWGEYVDSTNLVPRLWPRAGAVAERLWSSNLTTNIDFAFKRLS
    HFRCELVRRGIQAQPISVGCCEQEFEQT*
    
  6. Globally align the trimmed amino acid sequences. Type

    [Score, Alignment] = nwalign(humanProteinORF,
        mouseProteinORF);
    showalignment(Alignment)
    

    showalignment displays the results for the second global alignment. Notice that the percent identity for the untrimmed sequences is 54% and with trimmed sequences 83.3 percent.

  7. Another way to truncate an amino acid sequence to only those amino acids in the protein is to first truncate the nucleotide sequence with indices from the function seqshoworfs. Remember that the ORF for the human HEXA gene was on the third reading frame, and the ORF for the mouse HEXA was on the first reading frame.

    humanORFs = seqshoworfs(humanHEXA.Sequence);
    mouseORFs = seqshoworfs(humanHEXA.Sequence);
    
    humanPORF = nt2aa(humanHEXA.Sequence(humanORFs(3).Start(1):
        humanORFs(3).Stop(1)))
    mousePORF = nt2aa(mouseHEXA.Sequence(mouseORFs(1).Start(1):
        mouseORFs(1).Stop(1)))
    [Scale, Alignment] = nwalign(humanPORF, mousePORF)
    

    Show the alignment in the Help browser.

    showalignment(Alignment)
    

    The result from first truncating a nucleotide sequence before converting to an amino acid sequence is the same as the result from truncating the amino acid sequence after conversion. See the result in step 6.

    An alternative method to working with subsequences is to use a local alignment function with the nontruncated sequences.

  8. Locally align the two amino acid sequences using a Smith-Waterman algorithm. Type

    [LocalScore, LocalAlignment = swalign(humanProtein,
        mouseProtein)
    
    LocalScore =
            1057
    
    LocalAlignment
    RGDQR-AMTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYV . . .
    ||  | ||:: ||| |||||||:|  ||||||||| :||  :||: . . .
    RGAGRWAMAGCRLWVSLLLAAALACLATALWPWPQYIQTYHRRYT . . .
    

    swalign displays the local alignment of two sequences in the Help browser.

  9. Show the alignment in color.

    showalignment(LocalAlignment)
    


© 1994-2005 The MathWorks, Inc.