seqprofile

Calculate a sequence profile from a set of multiply aligned sequences

Syntax

Profile = seqprofile(Seqs,
                     'PropertyName', PropertyValue ...)
[Profile, Symbols] = seqprofile(Seqs)

seqprofile(..., 'Alphabet', AlphabetValue)
seqprofile(..., 'Counts', CountsValue)
seqprofile(..., 'Gaps', GapsValue)
seqprofile(..., 'Ambiguous', AmbiguousValue)
seqprofile(..., 'Limits', LimitsValue)

Arguments

SeqsSet of multiply aligned sequences. Enter an array of strings, cell array of strings, or an array of structures with the field Sequence.
Alphabet

Sequence alphabet. Enter 'NT' (nucleotides), 'AA' (amino acids), or 'none'. The default alphabet is 'AA'.

When Alphabet is 'none', the symbol list is based on the observed symbols. Every character can be a symbol except for a hyphen (-) and a period (.), which are reserved for gaps.

CountProperty to control returning frequency (ratio of counts/total counts) or counts. Enter either true (counts) or false (frequency). The default value is false.
GapsProperty to control counting gaps in a sequence. Enter 'all' (counts all gaps), 'noflanks' (counts all gaps except those at the flanks of every sequence), or 'none'. The default value is 'none'.
AmbiguousProperty to control counting ambiguous symbols. Enter 'Count' to add partial counts to the standard symbols.
LimitsProperty to specify using part of the sequences. Enter a [1x2] vector with the first position and the last position to include in the profile. The default value is [1,SeqLength].

Description

Profile = seqprofile(Seqs, 'PropertyName', PropertyValue ...) returns a matrix (Profile) of size [20 (or 4) x SequenceLength] with the frequency of amino acids (or nucleotides) for every column in the multiple alignment. The order of the rows is given by

[Profile, Symbols] = seqprofile(Seqs) returns a unique symbol list (Symbols) where every symbol in the list corresponds to a row in the profile (Profile).

seqprofile(..., 'Alphabet', AlphabetValue) selects a nucleotide alphabet, amino acid alphabet, or no alphabet.

seqprofile(..., 'Counts', CountsValue) when Counts is true, returns the counts instead of the frequency.

seqprofile(..., 'Gaps', GapsValue) appends a row to the bottom of a profile (Profile) with the count for gaps.

seqprofile(..., 'Ambiguous', AmbiguousValue), when Ambiguous is 'count', counts the ambiguous amino acid symbols (B Z X) and nucleotide symbols (R Y K M S W B D H V N) with the standard symbols. For example, the amino acid X adds a 1/20 count to every row while the amino acid B counts as 1/2 at the D and N rows.

seqprofile(..., 'Limits', LimitsValue) specifies the start and end positions for the profile relative to the indices of the multiple alignment.

Examples

   seqs = fastaread('pf00002.fa');
   [P,S] = seqprofile(seqs,'limits',[50 60],'gaps','all')

See Also

Bioinformatics Toolbox functions fastaread, multialignread, seqconsensus, seqdisp, seqlogo


© 1994-2005 The MathWorks, Inc.