Amino Acid Conversion and Composition

Determining the relative amino acid composition of a protein will give you a characteristic profile for the protein. Often, this profile is enough information to identify a protein. Using the amino acid composition, atomic composition, and molecular weight, you can also search public databases for similar proteins.

After you locate an open reading frame (ORF) in a gene, you can convert it to an amino sequence and determine its amino acid composition. This procedure uses the human mitochondria genome as an example. See Open Reading Frames.

  1. Convert a nucleotide sequence to an amino acid sequence. In this example only the protein-coding sequence between the start and stop codons is converted.

    ND2AASeq = nt2aa(ND2Seq,'geneticcode','Vertebrate Mitochondrial');
    

    The sequence is converted using the Vertebrate Mitochondrial genetic code. Because the property AlternativeStartCodons is set to 'true' by default, the first codon att is converted to M instead of I.

    MNPLAQPVIYSTIFAGTLITALSSHWFFTWVGLEMNMLAFIPVLTKKMNP
    RSTEAAIKYFLTQATASMILLMAILFNNMLSGQWTMTNTTNQYSSLMIMM
    AMAMKLGMAPFHFWVPEVTQGTPLTSGLLLLTWQKLAPISIMYQISPSLN
    VSLLLTLSILSIMAGSWGGLNQTQLRKILAYSSITHMGWMMAVLPYNPNM
    TILNLTIYIILTTTAFLLLNLNSSTTTLLLSRTWNKLTWLTPLIPSTLLS
    LGGLPPLTGFLPKWAIIEEFTKNNSLIIPTIMATITLLNLYFYLRLIYST
    SITLLPMSNNVKMKWQFEHTKPTPFLPTLIALTTLLLPISPFMLMIL
    
  2. Compare your conversion with the published conversion in GenPept.

    ND2protein = getgenpept('NP_536844','sequenceonly',true)
    

    MATLAB gets the published conversion from the NCBI database and reads it into the MATLAB workspace.

  3. Count the amino acids in the protein sequence.

    aacount(ND2AASeq, 'chart','bar')
    

    MATLAB draws a bar graph. Notice the high content for leucine, threonine and isoleucine, and also notice the lack of cysteine and aspartic acid.

  4. Determine the atomic composition and molecular weight of the protein.

    atomiccomp(ND2AASeq)
    molweight (ND2AASeq)
    

    MATLAB displays the following.

    ans = 
        C: 1818
        H: 3574
        N: 420
        O: 817
        S: 25
    ans =
      3.8960e+004
    

    If this sequence was unknown, you could use this information to identify the protein by comparing it with the atomic composition of other proteins in a database.


© 1994-2005 The MathWorks, Inc.