| Bioinformatics Toolbox | ![]() |
You could use alignment functions to look for similarities between two nucleotide sequences, but alignment functions return more biologically meaningful results when you are using amino acid sequences.
After you have located the open reading frames on your nucleotide sequences, you can convert the protein coding sections of the nucleotide sequences to their corresponding amino acid sequences, and then you can compare them for similarities.
Using the identified open reading frames, convert the DNA sequence to the amino acid sequences. Type
mouseProtein = nt2aa(mouseHEXA.Sequence)
Remember that the human HEXA gene was on the third reading frame, so you need to indicate which frame to use.
humanProtein = nt2aa(humanHEXA.Sequence,'frame',3)
Draw a dot plot comparing the human and mouse amino acid sequences. Type
seqdotplot(mouseProtein,humanProtein,4,3)
ylabel('Mouse hexosaminidase A (alpha subunit)')
xlabel('Human hexosaminidase A (alpha subunit)')
Dot plots are one of the easiest ways to look for similarity between sequences. The diagonal line shown below indicates that there may be a good alignment between the two sequences.

Globally align the two amino acid sequences, using the Needleman-Wunsch algorithm. Type
[GlobalScore, GlobalAlignment = nwalign(humanProtein,
mouseProtein)
showalignment(GlobalAlignment)
showalignment displays the global alignment of the two sequences in the Help browser. Notice that the calculated identity between the two sequences is 64.5 %.

The alignment is very good for the first 550 nucleotides, after which the two sequences appear to be unrelated. Notice that there is a stop (*) in the sequence at this point. If you shorten the sequence to include only the amino acids that are in the protein (after the first methionine and before the first stop) you might get a better alignment.
Trim the sequence from the first start amino acid (usually M) to the first stop (first *) and then try alignment again. Find the indices for the stops in the sequences.
humanStops = find(humanProtein == '*') humanStops = 538 550 652 661 669 mouseStops = find(mouseProtein =='*') mouseStops = 539 557 574 606
Looking at the amino acid sequence for humanProtein, the first M is at position 9, while the first M for the mouse protein is at 11.
Truncate the sequence to include only amino acids in the protein and the stop.
humanProteinORF = humanProtein(9:humanStops(1)); humanProteinORF = MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDV SSAAQPGCSVLDEAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVV TPGCNQLPTLESVENYTLTINDDQCLLLSETVWGALRGLETFSQLVWKSA EGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLSSILDTLDVMAYNKLNV FHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEYARLRG IRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEF MSTFFLEVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQ LESFYIQTLLDIVSSYGKGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNY MKELELVTKAGFRALLSAPWYLNRISYGPDWKDFYVVEPLAFEGTPEQKA LVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKLTSDLTFAYERL SHFRCELLRRGVQAQPLNVGFCEQEFEQT* mouseProteinORF = mouseProtein(11:mouseStops(1)) mouseProteinORF = MAGCRLWVSLLLAAALACLATALWPWPQYIQTYHRRYTLYPNNFQFRYHV SSAAQAGCVVLDEAFRRYRNLLFGSGSWPRPSFSNKQQTLGKNILVVSVV TAECNEFPNLESVENYTLTINDDQCLLASETVWGALRGLETFSQLVWKSA EGTFFINKTKIKDFPRFPHRGVLLDTSRHYLPLSSILDTLDVMAYNKFNV FHWHLVDDSSFPYESFTFPELTRKGSFNPVTHIYTAQDVKEVIEYARLRG IRVLAEFDTPGHTLSWGPGAPGLLTPCYSGSHLSGTFGPVNPSLNSTYDF MSTLFLEISSVFPDFYLHLGGDEVDFTCWKSNPNIQAFMKKKGFTDFKQL ESFYIQTLLDIVSDYDKGYVVWQEVFDNKVKVRPDTIIQVWREEMPVEYM LEMQDITRAGFRALLSAPWYLNRVKYGPDWKDMYKVEPLAFHGTPEQKAL VIGGEACMWGEYVDSTNLVPRLWPRAGAVAERLWSSNLTTNIDFAFKRLS HFRCELVRRGIQAQPISVGCCEQEFEQT*
Globally align the trimmed amino acid sequences. Type
[Score, Alignment] = nwalign(humanProteinORF,
mouseProteinORF);
showalignment(Alignment)
showalignment displays the results for the second global alignment. Notice that the percent identity for the untrimmed sequences is 54% and with trimmed sequences 83.3 percent.

Another way to truncate an amino acid sequence to only those amino acids in the protein is to first truncate the nucleotide sequence with indices from the function seqshoworfs. Remember that the ORF for the human HEXA gene was on the third reading frame, and the ORF for the mouse HEXA was on the first reading frame.
humanORFs = seqshoworfs(humanHEXA.Sequence);
mouseORFs = seqshoworfs(humanHEXA.Sequence);
humanPORF = nt2aa(humanHEXA.Sequence(humanORFs(3).Start(1):
humanORFs(3).Stop(1)))
mousePORF = nt2aa(mouseHEXA.Sequence(mouseORFs(1).Start(1):
mouseORFs(1).Stop(1)))
[Scale, Alignment] = nwalign(humanPORF, mousePORF)
Show the alignment in the Help browser.
showalignment(Alignment)
The result from first truncating a nucleotide sequence before converting to an amino acid sequence is the same as the result from truncating the amino acid sequence after conversion. See the result in step 6.
An alternative method to working with subsequences is to use a local alignment function with the nontruncated sequences.
Locally align the two amino acid sequences using a Smith-Waterman algorithm. Type
[LocalScore, LocalAlignment = swalign(humanProtein,
mouseProtein)
LocalScore =
1057
LocalAlignment
RGDQR-AMTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYV . . .
|| | ||:: ||| |||||||:| ||||||||| :|| :||: . . .
RGAGRWAMAGCRLWVSLLLAAALACLATALWPWPQYIQTYHRRYT . . .
swalign displays the local alignment of two sequences in the Help browser.
Show the alignment in color.
showalignment(LocalAlignment)

| Locating Protein Coding Sequences | Microarray Analysis | ![]() |
© 1994-2005 The MathWorks, Inc.