reading frame (ORF) is a segment of a nucleotide sequence that
begins with a start codon, ends with a stop codon and is long
enough to code for a protein. In USEARCH, the minimum number of
amino acid codons in an ORF is set by the ‑mincodons option,
default value 20.
With a nucleotide query sequence and amino
acid database, USEARCH performs a translated search. ORFs are identified
in the nucleotide sequence, and each ORF is treated as a separate
query with its own termination
conditions. This is because a single nucleotide sequence may
span more than one gene.
The most common application of translated
search is to find protein-coding genes in shotgun reads. With
shotgun, the read may span only part of an ORF, in which case the
start and/or end codons may be missing. USEARCH therefore supports
more flexible definitions of an ORF, controlled by the -orfstyle option.