Commands > OTU analysis, Reads, Chimeras
OTU / denoising analysis pipeline
Should I use UPARSE or UNOISE?
Uses the UNOISE algorithm
to perform denoising (error-correction) of amplicon reads.
Errors are corrected as follows:
- Reads with sequencing error are identified and
- Chimeras are removed.
Input is a set of
quality-filtered unique read sequences with size=nnn;
abundance annotations. See OTU / denoising
pipeline for details of how reads should be pre-processed and how other
types of errors and artifacts can be removed.
The algorithm is designed for Illumina reads,
it does not work as well on 454, Ion
Torrent or PacBio reads.
Predicted correct biological sequences are written to the -zotus
file in FASTA format. Labels are formatted as Otunnn;Uniqlabel; where
nnn is 1, 2, 3... and Uniqlabel is the label from the input
file (truncated at the first semi-colon, to strip any annotations).
Predicted correct amplicon sequences are written to the
-ampout fle in FASTA format. These include chimeras, so this output file is
not generally needed in a production pipeline. Labels are
formatted as Ampnnn;uniq=Uniqlabel;uniqsize=u;size=s;
where nnn is 1, 2, 3..., Uniqulabel is the label in the
input file, truncated at the first semi-colon, u is the size=
annotation from the input file and s is the total size of reads
derived from this amplicon.
The -chimeras option species a FASTA file for amplicons
which are predicted to be chimeric.
An OTU table can be generated using the
otutab command. See
OTU / denoising
The -minsize option specifies the minimum abundance
(size= annotation). Default is 8. Input sequences with lower abundances are
discarded. Most of the low-abundance sequences are usually noisy and are be
mapped to a ZOTU by the otutab command. For
higher sensivity, reducing minsize to 4 is reasonable, especially if
samples are denoised indivudually rather pooling
all samples together, as I would usually recommend. With smaller
minsize, there tends to be more errors in the predicted low-abundance
The -tabbedout option specifies a tabbed text filename
which reports the processing done for each sequence, e.g. if it is
classified as noisy or chimeric.
The -unoise_alpha option specifies the alpha parameter
(see UNOISE2 paper for definition). Default is
If you want size=nnn; annotations in the OTU or ZOTU
sequence labels, see adding sizes.
usearch -unoise3 uniques.fa -zotus zotus.fa