Home Software Services About Contact usearch manual
CIGAR string

CIGAR stands for Concise Idiosyncratic Gapped Alignment Report. It is a compressed representation of an alignment that is used in the SAM file format.

A CIGAR standard was originally defined by the Exonerate alignment program, but this is not the same as the CIGARs found in SAM files. Several incompatible types of CIGAR string are used by different programs that support SAM files, and unfortunately CIGARs are not fully described by the SAM specification. The description here covers those SAM file CIGAR standards that I'm aware of. If you know of other variants, please let me know.

A CIGAR string is made up of <integer><op> pairs, e.g. 76H130M. Here, "op" is an operation specified as a single character, usually an upper-case letter (see table below). An operation is usually a type of column that appears in the alignment, e.g. a match or gap. The integer specifies a number of consecutive operations. In some CIGAR variants, the integer may be omitted if it is 1.
 

Op   Description
M   Match (alignment column containing two letters). This could contain two different letters (mismatch) or two identical letters. USEARCH generates CIGAR strings containing Ms rather than X's and ='s (see below).
D   Deletion (gap in the target sequence).
I   Insertion (gap in the query sequence). 
S   Segment of the query sequence that does not appear in the alignment. This is used with soft clipping, where the full-length query sequence is given (field 10 in the SAM record). In this case, S operations specify segments at the start and/or end of the query that do not appear in a local alignment.
H   Segment of the query sequence that does not appear in the alignment. This is used with hard clipping, where only the aligned segment of the query sequences is given (field 10 in the SAM record). In this case, H operations specify segments at the start and/or end of the query that do not appear in the SAM record.
=   Alignment column containing two identical letters. USEARCH can read CIGAR strings using this operation, but does not generate them.
X   Alignment column containing a mismatch, i.e. two different letters. USEARCH can read CIGAR strings using this operation, but does not generate them.