int2nt

Convert nucleotide sequence from integer to letter representation

Syntax

SeqChar = int2nt(SeqInt,
                 'PropertyName', PropertyValue...)

int2nt(..., 'Alphabet', AlphabetValue)
int2nt(..., 'Unknown', UnknownValue)
int2nt(..., 'Case', CaseValue)

Arguments

SeqInt

Nucleotide sequence represented by integers. Enter a vector of integers from the table Mapping Nucleotide Integers to Letters below. The array does not have to be of type integer, but it does have to contain only integer numbers. Integers are arbitrarily assigned to IUB/IUPAC letters.

Alphabet

Property to select the nucleotide alphabet. Enter either 'DNA' or 'RNA'.

Unknown

Property to select the integer value for the unknown character. Enter a character to map integers 16 or greater to an unknown character. The character must not be one of the nucleotide characters A, T, C, G or the ambiguous nucleotide characters N, R, Y, K, M, S, W, B, D, H, or V. The default character is *.

Case

Property to select the letter case for the nucleotide sequence. Enter either 'upper' or 'lower'. The default value is 'lower'.

Mapping Nucleotide Integers to Letters

Nucleotide Base Nucleotide Base Nucleotide Base 

Adenosine

1A

R - A, G (purine)

6R

B - T, G, C

12B

Cystine

2C

Y - T, C (pyrimidine)

7Y

D - A, T, G

13D

Guanine

3G

K - G, T (keto)

8K

H - A, T, C

14H

Thymidine with Alphabet = 'DNA'

4T

M - A, C (amino)

9M

V - A, G, C

15V

U - uridine with Alphabet = 'RNA'

4U

S - G, C (strong)

10S

- Gap of indeterminate length

16-

N - A, T, G, C

(any)

5N

W - A, T (weak)

 

* Unknown (default)

0*

Description

int2nt(SeqNT, 'PropertyName', PropertyValue...) converts a 1-by-N array of integers to a character string using the table Mapping Nucleotide Letters to Integers above.

int2nt(..., 'Alphabet', AlphabetValue) defines the nucleotide alphabet to use. The default value is 'DNA', which uses the symbols A, T, C, and G. If Alphabet is set to 'RNA', the symbols A, C, U, G are used instead.

int2nt(..., 'Unknown', UnknownValue) defines the character to represent an unknown nucleotide base. The default character is '*'.

int2nt(..., 'Case', CaseValue) sets the output case of the nucleotide string. The default is uppercase.

Examples

Enter a sequence of integers as a MATLAB vector (space or comma-separated list with square brackets).

s = int2nt([1 2 4 3 2 4 1 3 2])

s =
   ACTGCTAGC

Define a symbol for unknown numbers 16 and greater.

si = [1 2 4 20 2 4 40 3 2];
s = int2nt(si, 'unknown', '#')

s =
ACT#CT#GC

See Also

Bioinformatics Toolbox function aa2int, baselookup, int2aa, nt2int


© 1994-2005 The MathWorks, Inc.