USEARCH manual > open reading frames
open reading frames
 
An open reading frame (ORF) is a segment of a nucleotide sequence that begins with a start codon, ends with a stop codon and is long enough to code for a protein. In USEARCH, the minimum number of amino acid codons in an ORF is set by the ‑mincodons option, default value 20.

With a nucleotide query sequence and amino acid database, USEARCH performs a translated search. ORFs are identified in the nucleotide sequence, and each ORF is treated as a separate query with its own termination conditions. This is because a single nucleotide sequence may span more than one gene.

The most common application of translated search is to find protein-coding genes in shotgun reads. With shotgun, the read may span only part of an ORF, in which case the start and/or end codons may be missing. USEARCH therefore supports more flexible definitions of an ORF, controlled by the -orfstyle option.