Home Software Services About Contact usearch manual
USEARCH quick start for BLAST users
 
Search commands
In BLAST, different programs are used depending on the sequence types (protein or nucleotide) of the query and database sequences. In USEARCH, there is only one program which supports several commands. All search commands support protein-protein, nucleotide-nucleotide and nucleotide-protein (translated) search. The sequence types of the query and database are detected automatically. Different USEARCH commands are used for different alignment styles (local or global) and for different underlying search algorithms: USEARCH for top hit(s) at higher identities, or UBLAST, which is slower but sensitive to lower identities.
 
BLAST program Query DB Comments
BLASTN nucl. nucl. The usearch_global (most often) and usearch_local (rarely) commands are commonly used as replacements for BLASTN and MEGABLAST. The ublast command can also be used if it is important to find as many hits as possible. See comments below about nucleotide searches. BLASTN always searches both plus and minus strands, in USEARCH the -strand option allows searching on the plus strand only (faster).
BLASTP protein protein The usearch_global and  ublast commands are most often used as replacements for BLASTP. Use usearch_global for high-identity top-hit(s) searches, e.g. for orthologs. Use ublast if you need good sensitivity to low-identity hits, e.g. <50%, and/or if it important to find as many hits as possible to the database.
BLASTX nucl. protein See translated search. The ublast command is commonly used as a fast replacement for BLASTX in applications such as searching for genes in metagenomic shotgun reads. BLASTX attempts to extend alignments through frameshifts. USEARCH does not do this, though frameshifts can be inferred from adjacent hits in different frames. Let me know if this feature would be important to you.
TBLASTX nucl. nucl. Search with translated sequences. This type of translated search is rarely used in practice and is not directly supported by USEARCH. If you have a need for this, please let me know. Searches of this type can be handled by using the findorfs command to get ORFs, i.e. possible coding sequences. Use the ‑xlat option to get translated sequences. Then you can use a straightforward protein search with ublast (low-identity) or usearch_global or usearch_local (high identity).
TBLASTN protein nucl. See comments above for TBLASTX.
MEGABLAST nucl. nucl. The usearch_global (most often) and usearch_local commands are commonly used as replacements for MEGABLAST. The ublast command can also be used if it is important to find as many hits as possible.

Local and global alignments
BLAST supports only local searches, while USEARCH supports both local and global search. In some applications, global alignments can be more effective. For example, in 16S community analysis, sequence identity is used as a simple measure of evolutionary distance, with rules of thumb like >97% indicates the same species, >95% the same genus. Here, sequence Identities are better measured from a global alignment because a local alignment may not extend through hypervariable regions, resulting in an overestimated high identity for the sequence. See also database trimming.

Database files
BLAST requires the database to be formatted using formatdb or makeblastdb. USEARCH commands allow the database file to be provided in FASTA or UDB format. The filename is specified using the ‑db option, the format is automatically detected. FASTA is convenient because it saves the makeudb step, but UDB files are faster to load and take less memory.

Output file compatibility
USEARCH supports the tabbed output file format of BLAST (-m8 or -outfmt 6 option) with the blast6out option. In most respects, the format is identical. At the time of writing, the only difference I'm aware of is that USEARCH does not sort all hits for a given nucleotide sequence by E-value in the case of translated search. This is because each ORF is treated internally by USEARCH as a separate query; hits for a given ORF are sorted by E-value. This may be changed in future USEARCH releases, if I ever get around to it (doubtful, unless you can convince me that this is a real problem in practice).
 

E-value threshold
In BLAST, the E-value threshold defaults to 10. This threshold is presumably intended to maximize sensitivity and the expense of a very high error rate. This threshold will produce many false positive hits, and may cause slow execution due to the large number of gapped extensions that must be attempted and perhaps also due to writing large output files. USEARCH does not have a default E-value threshold, which must be specified by the user using the ‑evalue option. It is not possible to calculate E-values for global alignments. For global alignments, an identity threshold must usually be specified. Unlike BLAST, USEARCH supports a rich set of "accept options" providing additional criteria to decide if an alignment is a hit.

Search termination
USEARCH allows a search to be terminated if a sufficiently strong hit is found, saving the time needed to search the rest of the database. See also weak hits.

Nucleotide searches
Nucleotide sequence homology cannot be reliably detected below roughly 75% identity. Below 50%, most hits are probably noise. Most nucleotide searches are therefore medium- or high-identity by USEARCH standards, and the USEARCH algorithm is usually effective (usearch_global and usearch_local commands). The ublast command might be preferred if it is important to find all possible hits. The important limitations of USEARCH for nucleotide search are related to sequence length (see below).

Chromosomes and other long sequences
USEARCH is not designed for long database or query sequences. So while BLASTN can be used to find local similarities between a pair of chromosomes, USEARCH cannot do this directly. See long sequences.