Uses the UNOISE algorithm to perform denoising (error-correction) of amplicon reads.
Errors are corrected as follows:
- Reads with sequencing error are identified and corrected.
- Chimeras are removed.
Input is a set of quality-filtered unique read sequences with size=nnn; abundance annotations. See OTU / denoising pipeline for details of how reads should be pre-processed and how other types of errors and artifacts can be removed.
The input file must be sorted by decreasing abundance, i.e. by decreasing value of the size=nnn annotation. The can be done using the sortbysize command.
The algorithm is designed for Illumina reads, it does not work as well on 454, Ion Torrent or PacBio reads.
Predicted correct biological sequences are written to the -zotus file in FASTA format. Labels are formatted as Zotunnn where nnn is 1, 2, 3...
Predicted correct amplicon sequences are written to the -ampout fle in FASTA format. These include chimeras, so this output file is not generally needed in a production pipeline. Labels are formatted as Ampnnn;uniq=Uniqlabel;uniqsize=u;size=s; where nnn is 1, 2, 3..., Uniqulabel is the label in the input file, truncated at the first semi-colon, u is the size= annotation from the input file and s is the total size of reads derived from this amplicon.
The -minsize option specifies the minimum abundance (size= annotation). Default is 8. Input sequences with lower abundances are discarded. Most of the low-abundance sequences are usually noisy and are be mapped to a ZOTU by the otutab command. For higher sensivity, reducing minsize to 4 is reasonable, especially if samples are denoised indivudually rather pooling all samples together, as I would usually recommend. With smaller minsize, there tends to be more errors in the predicted low-abundance biological sequences.
The -tabbedout option specifies a tabbed text filename which reports the processing done for each sequence, e.g. if it is classified as noisy or chimeric.
The -unoise_alpha option specifies the alpha parameter (see UNOISE2 paper for definition). Default is 2.0.
usearch -unoise3 uniques.fa -zotus zotus.fa -tabbedout unoise3.txt