Closed-reference OTU assignment algorithm
Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



Closed-reference OTU assignment algorithm

See also
  closed_ref command
  Problems with closed- and open-reference OTU assignment

Closed-reference OTU assignment (Rideout et al. 2014) assigns query sequences to OTUs by searching a pre-defined database of full-length sequences which have been clustered at 97% identity. In QIIME, it is implemented by the script using a default database which was obtained by clustering Greengenes. In QIIME v1.9, the database search is performed by uclust, an old software package that was the predecessor of usearch.

In usearch, a similar algorithm is implemented in the closed_ref command. The USEARCH algorithm is used to search the database. Different parameters are used compared to the usearch_global command to improve sensitivity and report cases where two or more database sequences are tied for the highest identity. Ties are broken systematically by reporting the first of hit in database file order.

Reference (please cite)
R.C. Edgar (2017), Accuracy of microbial community diversity estimated by closed- and open-reference OTUs, PeerJ 5:e3889
  • QIIME closed- and open-reference clustering generates huge numbers of spurious OTUs

  • Closed-reference OTU assignment splits strains and species even when no sequence errors

  • Closed-reference fails to assign different hyper-variable regions to the same OTU

  • Closed-reference discards many well-known species that are present in Greengenes