nt2int

Convert nucleotide sequence from letter to integer representation

Syntax

SeqInt = nt2int(SeqChar, 'PropertyName', PropertyValue)

nt2int(..., 'Unknown', UnknownValue)
nt2int(..., 'ACGTOnly', ACGTOnlyValue)

Arguments

SeqNT

Nucleotide sequence represented with letters. Enter a character string from the table Mapping Nucleotide Letters to Integers below. Integers are arbitrarily assigned to IUB/IUPAC letters. If the property ACGTOnly is true, you can only enter the characters A, C, T, G, and U.

UnknownValue

Property to select the integer for unknown characters. Enter an integer. Maximum value is 255. Default value is 0.

ACGTOnlyValue

Property to control the use of ambiguous nucleotides. Enter either true or false. Default value is false.

Mapping Nucleotide Letters to Integers

Base

Code

Base

Code

Base

Code

Adenosine

A1

A, G (purine)

R6

T, G, C

R12

Cytidine

C2

T, C (pyrimidine)

Y7

A, T, G

Y13

Guanine

G3

G, T (keto)

K8

A, T, C

K14

Thymidine

T4

A, C (amino)

M9

A, G, C

V15

Uridine

U4

G, C (strong)

S10

Gap of indeterminate length

-16

A, T, G, C (any)

N5

A, T (weak)

W11

Unknown (default)

*0

Description

nt2int(SeqNT, 'PropertyName', PropertyValue) converts a character string of nucleotides to a 1-by-N array of integers using the table Mapping Nucleotide Letters to Integers above. Unknown characters (characters not in the table) are mapped to 0. Gaps represented with hyphens are mapped to 16.

nt2int(SeqNT,'Unknown',UnknownValue) defines the number used to represent unknown nucleotides. The default value is 0.

nt2int(SeqNT,'ACGTOnly', ACGTONlyValue) if ACGTOnly is true, the ambiguous nucleotide characters (N, R, Y, K, M, S, W, B, D, H, and V) are represented by the unknown nucleotide number.

Examples

Convert a nucleotide sequence with letters to integers.

s = nt2int('ACTGCTAGC') 

s = 
     1    2    4    3    2    4    1    3    2

See Also

Bioinformatics Toolbox function aa2int, baselookup, int2aa, int2nt


© 1994-2005 The MathWorks, Inc.