Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



Local and global alignments

See also
  Alignment parameters
  Alignment heuristics

USEARCH generates local or global alignments, depending on the command. Most commands have one local and one global variant, e.g. usearch_local and usearch_global. As an exception, ublast supports only local alignments. This is because the UBLAST algorithm is designed primarily to detect distant protein relationships, which are typically local in character because distance homology is often localized to a single domain within the protein.

Global alignments
A global alignment contains all letters from both the query and target sequences. However, it is common in USEARCH applications for the target sequence to be significantly longer than the query (e.g. query is a short read, target is a full-length gene), in which case the alignment will usually have terminal gaps, as in this example:

 Query ---------QVERYSEQ-------

Where possible, database sequences should be trimmed to minimize terminal gaps.

Local alignments
A local alignment aligns a substring of the query sequence to a substring of the target sequence. The substrings may be all of one or both sequences; if all of both are included then the local alignment is also global. A local alignment is defined by maximizing the alignment score, so that deleting a column from either end would reduce the score, and adding further columns at either end would also reduce the score. For example, consider this global protein alignment:


Here, the local alignment would be obtained by deleting the first and last columns, because WK and AN have negative substitution scores in the BLOSUM62 matrix.

Local alignments never have terminal gaps, because a higher score could be obtained by deleting the gaps (which always have negative scores, i.e. penalties).

Finding local alignments that are approximately global
The ‑query_cov and ‑target_cov accept options can be used with local alignments to require that the alignment covers most of one or both sequences. For example, ‑query_cov 0.9 would require most of the query to be aligned (semi-global), and -query_cov 0.9 ‑target_cov 0.9 would require most of both sequences to be aligned. Note that these requirements are applied AFTER the local alignment is already created, so the effect is to reject alignments that are too short, NOT to extend them further.