seqshowwords

Graphically display the words in a sequence

Syntax

seqshowwords(Seq, Word, 'PropertyName', PropertyValue)

seqshowwords(...,'Color', ColorValue)
seqshowwords(...,'Columns', ColumnsValue)

Arguments

Seq

Enter either a nucleotide or amino acid sequence. You can also enter a structure with the field Sequence.

Word

Enter a short character sequence.

ColorValue

Property to select the color for highlighted characters. Enter a 1-by-3 RGB vector specifying the intensity (0–255) of the red, green, and blue components, or enter a character from the following list: 'b'– blue, 'g'– green, 'r'– red, 'c'– cyan, 'm'– magenta, or 'y'– yellow.

The default color is red 'r'.

ColumnsValue

Property to specify the number of characters in a line. Default value is 64.

Description

seqshowwords(Seq, Word) displays the sequence with all occurrences of a word highlighted, and returns a structure with the start and stop positions for all occurrences of the word in the sequence.

seqshowwords(...,'Color', ColorValue) selects the color used to highlight the words in the output display.

seqshowwords(...,'Columns', ColumnsValue) specifies how many columns per line to use in the output.

Examples

If word contains nucleotide or amino acid symbols that represent multiple possible symbols (ambiguous characters), then seqshowwords shows all matches. For example, the symbol R represents either G or A (purines). For another example, if word equals 'ART', then seqshowwords counts occurrences of both 'AAT' and 'AGT'. This example shows two matches, 'TAGT' and 'TAAT', for the word 'BART'.

seqshowwords('GCTAGTAACGTATATATAAT','BART')

ans = 
    Start: [3 17]
    Stop: [6 20]

000001 GCTAGTAACGTATATATAAT

seqshowwords does not highlight overlapping patterns multiple times. This example highlights two places, the first occurrence of 'TATA' and the 'TATATATA' immediately after 'CG'. The final 'TA' is not highlighted because the preceding 'TA' is part of an already matched pattern.

seqshowwords('GCTATAACGTATATATATA','TATA')

ans = 
    Start: [3 10 14]
    Stop: [6 13 17]

000001 GCTATAACGTATATATATA

To highlight all multiple repeats of TA, use the regular expression 'TA(TA)*TA'.

seqshowwords('GCTATAACGTATATATATA','TA(TA)*TA')

ans = 
    Start: [3 10]
    Stop: [6 19]

000001 GCTATAACGTATATATATA

See Also

Bioinformatics Toolbox functions palindromes, restrict, seqdisp, seqshoworfs

MATLAB functions findstr, regexp


© 1994-2005 The MathWorks, Inc.