| Bioinformatics Toolbox | ![]() |
Calculate a sequence profile from a set of multiply aligned sequences
Profile = seqprofile(Seqs,
'PropertyName', PropertyValue ...)
[Profile, Symbols] = seqprofile(Seqs)
seqprofile(..., 'Alphabet', AlphabetValue)
seqprofile(..., 'Counts', CountsValue)
seqprofile(..., 'Gaps', GapsValue)
seqprofile(..., 'Ambiguous', AmbiguousValue)
seqprofile(..., 'Limits', LimitsValue)
| Seqs | Set of multiply aligned sequences. Enter an array of strings, cell array of strings, or an array of structures with the field Sequence. |
| Alphabet | Sequence alphabet. Enter 'NT' (nucleotides), 'AA' (amino acids), or 'none'. The default alphabet is 'AA'. When Alphabet is 'none', the symbol list is based on the observed symbols. Every character can be a symbol except for a hyphen (-) and a period (.), which are reserved for gaps. |
| Count | Property to control returning frequency (ratio of counts/total counts) or counts. Enter either true (counts) or false (frequency). The default value is false. |
| Gaps | Property to control counting gaps in a sequence. Enter 'all' (counts all gaps), 'noflanks' (counts all gaps except those at the flanks of every sequence), or 'none'. The default value is 'none'. |
| Ambiguous | Property to control counting ambiguous symbols. Enter 'Count' to add partial counts to the standard symbols. |
| Limits | Property to specify using part of the sequences. Enter a [1x2] vector with the first position and the last position to include in the profile. The default value is [1,SeqLength]. |
Profile = seqprofile(Seqs, 'PropertyName', PropertyValue ...) returns a matrix (Profile) of size [20 (or 4) x SequenceLength] with the frequency of amino acids (or nucleotides) for every column in the multiple alignment. The order of the rows is given by
4 nucleotides — A C G T/U
20 amino acids — A R N D C Q E G H I L K M F P S T W Y V
[Profile, Symbols] = seqprofile(Seqs) returns a unique symbol list (Symbols) where every symbol in the list corresponds to a row in the profile (Profile).
seqprofile(..., 'Alphabet', AlphabetValue) selects a nucleotide alphabet, amino acid alphabet, or no alphabet.
seqprofile(..., 'Counts', CountsValue) when Counts is true, returns the counts instead of the frequency.
seqprofile(..., 'Gaps', GapsValue) appends a row to the bottom of a profile (Profile) with the count for gaps.
seqprofile(..., 'Ambiguous', AmbiguousValue), when Ambiguous is 'count', counts the ambiguous amino acid symbols (B Z X) and nucleotide symbols (R Y K M S W B D H V N) with the standard symbols. For example, the amino acid X adds a 1/20 count to every row while the amino acid B counts as 1/2 at the D and N rows.
seqprofile(..., 'Limits', LimitsValue) specifies the start and end positions for the profile relative to the indices of the multiple alignment.
seqs = fastaread('pf00002.fa');
[P,S] = seqprofile(seqs,'limits',[50 60],'gaps','all')
Bioinformatics Toolbox functions fastaread, multialignread, seqconsensus, seqdisp, seqlogo
| seqpdist | seqrcomplement | ![]() |
© 1994-2005 The MathWorks, Inc.