| Bioinformatics Toolbox | ![]() |
Calculate a consensus sequence
CSeq = seqconsensus(Seqs,
'PropertyName', PropertyValue ...)
[CSeq, Score] = seqconsensus(Seqs)
CSeq = seqconsensus(Profile)
seqconsensus(..., 'ScoringMatrix', ScoringMatrixValue)
seqconsensus(..., 'Alphabet', AlphabetValue)
seqconsensus(..., 'Gaps', GapsValue)
seqconsensus(..., 'Ambiguous', AmbiguousValue)
seqconsensus(..., 'Limits', LimitsValue)
| Seqs | Set of multiply aligned amino acid or nucleotide sequences. Enter an array of strings, a cell array of strings, or an array of structures with the field Sequence. |
| Profile | Sequence profile. Enter a profile from the function seqprofile. Profile is a matrix of size [20 (or 4) x Sequence Length] with the frequency or count of amino acids (or nucleotides) for every position. Profile can also have 21 (or 5) rows if gaps are included in the consensus. |
| ScoringMatrix | Scoring matrix. The default value is BLOSUM50 for amino acid sequences or NUC44 for nucleotide sequences. ScoringMatrix can also be a 21x21, 5x5, 20x20, or 4x4 numeric array. For the gap-included cases, gap scores (last row/column) are set to mean(diag(ScoringMatrix))for a gap matching with another gap, and set to mean(nodiag(ScoringMatrix)) for a gap matching with another symbol |
CSeq = seqconsensus(Seqs, 'PropertyName', PropertyValue ...), for a multiply aligned set of sequences (Seqs), returns a string with the consensus sequence (CSeq). The frequency of symbols (20 amino acids, 4 nucleotides) in the set of sequences is determined with the function seqprofile. For ambiguous nucleotide or amino acid symbols, the frequency or count is added to the standard set of symbols.
[CSeq, Score] = seqconsensus(Seqs) returns the conservation score of the consensus sequence. Scores are computed with the scoring matrix BLOSUM50 for amino acids or NUC44 for nucleotides. Scores are the average euclidean distance between the scored symbol and the M-dimensional consensus value. M is the size of the alphabet. The consensus value is the profile weighted by the scoring matrix.
CSeq = seqconsensus(Profile) returns a string with the consensus sequence (CSeq) from a sequence profile (Profile).
seqconsensus(..., 'ScoringMatrix', ScoringMatrixValue) specifies the scoring matrix.
The following input parameters are analogous to the function seqprofile when the alphabet is restricted to 'AA' or 'NT'.
seqconsensus(..., 'Alphabet', AlphabetValue)
seqconsensus(..., 'Gaps', GapsValue)
seqconsensus(..., 'Ambiguous', AmbiguousValue)
seqconsensus(..., 'Limits', LimitsValue)
seqs = fastaread('pf00002.fa');
[C,S] = seqconsensus(seqs,'limits',[50 60],'gaps','all')
Bioinformatics Toolbox functions fastaread, multialignread, seqdisp, seqlogo, seqprofile
| seqcomplement | seqdisp | ![]() |
© 1994-2005 The MathWorks, Inc.