Getting Sequence Information into MATLAB

Many public data bases for nucleotide sequences are accessible from the Web. The MATLAB command window provides an integrated environment for bringing sequence information into MATLAB.

The consensus sequence for the human mitochondrial genome has the GenBank accession number NC_001807. Since the whole GenBank entry is quite large and you might only be interested in the sequence, you can get just the sequence information.

  1. Get sequence information from a Web database.For example, to get sequence information for the human mitochondrial genome, in the MATLAB Command Window, type

    mitochondria = getgenbank('NC_001807','SequenceOnly',true);
    

    MATLAB gets the nucleotide sequence from the GenBank database and creates a character array.

    mitochondria = 
    gatcacaggtctatcaccctattaaccactcacgggagctctccatgcat
    ttggtattttcgtctggggggtgtgcacgcgatagcattgcgagacgctg
    gagccggagcaccctatgtcgcagtatctgtctttgattcctgcctcatt
    ctattatttatcgcacctacgttcaatattacaggcgaacatacctacta
    aagt . . . 
    
  2. If you don't have a Web connection, you can load the data from a MAT-file included with the Bioinformatics Toolbox, using the command

    load mitochondria
    

    MATLAB loads the sequence mitochondria into the MATLAB workspace.

  3. Get information about the sequence. Type

    whos mitochondria
    

    MATLAB displays information about the size of the sequence.

    Name               Size                   Bytes  Class
     mitochondria       1x16571                33142  char array
    
    Grand total is 16571 elements using 33142 bytes
    


© 1994-2005 The MathWorks, Inc.