getgenbank

Retrieve sequence information from GenBank database

Syntax

Data = getgenbank('AccessionNumber',                 
                  'PropertyName',PropertyValue...)

getgenbank(..., 'ToFile', ToFileValue)
getgenbank(..., 'FileFormat', FileFormatValue)
getgenbank(..., 'SequenceOnly', SequenceOnlyValue)

Arguments

AccessionNumber

Unique identifier for a sequence record. Enter a unique combination of letters and numbers.

ToFile

Property to specify the location and filename for saving data. Enter either a filename or a path and filename supported by your system (ASCII text file).

FileFormatProperty to select the format for the file specified with the property ToFileValue. Enter either 'GenBank' or 'FASTA'.

SequenceOnly

Property to control getting the sequence only. Enter either true or false.

Description

getgenbank retrieves nucleotide and amino acid sequence information from the GenBank database. This database is maintained by the National Center for Biotechnology Information (NCBI). For more details about the GenBank database, see

http://www.ncbi.nlm.nih.gov/Genbank/

Data = getgenbank('AccessionNumber', 'PropertyName', PropertyValue...) searches for the accession number in the GenBank database and returns a MATLAB structure containing information for the sequence. If an error occurs while retrieving the GenBank formatted information, then an attempt is make to retrieve the FASTA formatted data.

getgenbank(..., 'ToFile', ToFileValue) saves the data returned from GenBank in a file. If you do not give a location or path to the file, the file is stored in the MATLAB current directory. Read a GenBank formatted file back into MATLAB using the function genbankread.

getgenbank(..., 'FileFormat', FileFormatValue) returns the sequence in the specified format (FileFormatValue).

getgenbank(..., 'SequenceOnly', SequenceOnlyValue) when SequenceOnly is true, returns only the sequence as a character array. When the properties SequenceOnly and ToFile are used together, the output file is in the FASTA format.

getgenbank(...) displays the information to the screen without returning data to a variable. The displayed information includes hyperlinks to the URLS used to search for and retrieve the data.

Examples

Retrieve the sequence from chromosome 19 that codes for the human insulin receptor and store it in structure S.

S = getgenbank('M10051')

S = 

                LocusName: 'HUMINSR'
      LocusSequenceLength: '4723'
     LocusNumberofStrands: ''
            LocusTopology: 'linear'
        LocusMoleculeType: 'mRNA'
     LocusGenBankDivision: 'PRI'
    LocusModificationDate: '06-JAN-1995'
               Definition: 'Human insulin receptor mRNA, complete cds.'
                Accession: 'M10051'
                  Version: 'M10051.1'
                       GI: '186439'
                 Keywords: 'insulin receptor; tyrosine kinase.'
                  Segment: []
                   Source: 'Homo sapiens (human)'
           SourceOrganism: [3x65 char]
                Reference: {[1x1 struct]}
                  Comment: [14x67 char]
                 Features: [51x74 char]
                      CDS: [139 4287]
                 Sequence: [1x4723 char]
                SearchURL: [1x105 char]
              RetrieveURL: [1x95 char]

See Also

Bioinformatics Toolbox functions genbankread, getembl, getgenpept, getpdb, getpir


© 1994-2005 The MathWorks, Inc.