uchime_ref command

Chimera detection using the UCHIME algorithm. See UCHIME score for parameters.

A database file of nucleotide sequences must be specified using the ‑db option. The database may be in FASTA or UDB format. UDB format is faster to load. The reference database should include sequences that might appear as parents in the query set. These should be high-quality sequences that are believed to be free of chimeras. Errors in reference sequences will degrade detection accuracy and increase the number of false positives. Chimeras will not be detected if their parents (or sufficiently close relatives) are not present in the database.

See UCHIME downloads for some suggested 16S reference databases.

The ‑strand plus option is required. If query sequences may be oriented on the opposite strand compared to the database, then the database should contain both forward and reverse-complemented sequences.

Multithreading is supported.

Output options are ‑uchimeout, ‑uchimealns, ‑chimeras and ‑nonchimeras. The ‑uchimeout5 option sets the uchimeout format to be compatible with previous versions.

The ‑self and ‑selfid options specify that a reference sequence matching the query sequence should be ignored. This is useful for estimating the false-positive rate using a database of sequences known to be free of chimeras. With ‑self, matching is done by the sequence label, with ‑selfid matching is done from an alignment (a 100% match is ignored).

Example

usearch -uchime_ref reads.fasta -db 16s_ref.udb -uchimeout results.uchime -strand plus