USEARCH manual

unoise3 command
Commands > OTU analysis, Reads, Chimeras

Uses the UNOISE algorithm to perform denoising (error-correction) of amplicon reads.

Errors are corrected as follows:
- Reads with sequencing error are identified and corrected.
- Chimeras are removed.

Input is a set of quality-filtered unique read sequences with size=nnn; abundance annotations. See OTU / denoising pipeline for details of how reads should be pre-processed and how other types of errors and artifacts can be removed.

The algorithm is designed for Illumina reads, it does not work as well on 454, Ion Torrent or PacBio reads.

Predicted correct biological sequences are written to the -zotus file in FASTA format. Labels are formatted as Otunnn;Uniqlabel; where nnn is 1, 2, 3... and Uniqlabel is the label from the input file (truncated at the first semi-colon, to strip any annotations).

Predicted correct amplicon sequences are written to the -ampout fle in FASTA format. These include chimeras, so this output file is not generally needed in a production pipeline. Labels are formatted as Ampnnn;uniq=Uniqlabel;uniqsize=u;size=s; where nnn is 1, 2, 3..., Uniqulabel is the label in the input file, truncated at the first semi-colon, u is the size= annotation from the input file and s is the total size of reads derived from this amplicon.

The -chimeras option species a FASTA file for amplicons which are predicted to be chimeric.

An OTU table can be generated using the otutab command. See OTU / denoising pipeline.

The -minsize option specifies the minimum abundance (size= annotation). Default is 8. Input sequences with lower abundances are discarded. Most of the low-abundance sequences are usually noisy and are be mapped to a ZOTU by the otutab command. For higher sensivity, reducing minsize to 4 is reasonable, especially if samples are denoised indivudually rather pooling all samples together, as I would usually recommend. With smaller minsize, there tends to be more errors in the predicted low-abundance biological sequences.

The -tabbedout option specifies a tabbed text filename which reports the processing done for each sequence, e.g. if it is classified as noisy or chimeric.

The -unoise_alpha option specifies the alpha parameter (see UNOISE2 paper for definition). Default is 2.0.

If you want size=nnn; annotations in the OTU or ZOTU sequence labels, see adding sizes.

Example

usearch -unoise3 uniques.fa -zotus zotus.fa -tabbedout unoise3.txt