UNOISE pipeline recovers biological sequences from an amplicon sequencing
experiment by performing error-correction (denoising) of Illumina reads. UNOISE
is not designed for other sequencing technologies, e.g. 454 pyrosequencing
reads. The UNOISE algorithm is implemented in the
See Tutorials for example
scripts & data.
Reads in FASTQ format
I strongly recommended starting from "raw" reads, i.e. the reads originally
provided by the sequencing machine base-calling software. You should do quality
filtering with USEARCH rather than using reads that have already been filtered
by third-party software.
Reads in FASTA format
command supports reads in FASTA format. You may need to do this if your
reads have already been quality filtered by some other method and you don't
have access to the original FASTQ reads.
I recommend combining reads from as many samples as possible. See sample pooling
Read quality filtering
filtering of the reads should be done using USEARCH because
maximum expected error
filtering method is much more effective at suppressing reads with high error
rates than other filters, e.g. those based on
average Q scores. Using a maximum expected errors of 1.0
is a good default choice (-fastq_maxee 1.0 option to
fastq_filter or fastq_merge_maxee 1.0 option
of fastq_mergepairs). You can use
fastx_learn to estimate the error rate after
You should trim reads
to a fixed length unless the sequences are contigs generated by a paired
read assembler, in which case it may not be necessary. You should also trim
any primer-binding sequences at the ends of the reads. See
global trimming for discussion.
Get the set of
unique sequences with abundances using the
fastx_uniques command with the -sizeout option. This will be the input
file for the unoise command.
Creating an OTU table
Denoised sequences are valid OTUs (the clustering identity is 100%, if you
like) and can be used to generate an OTU table using the
in just the same way as 97% OTUs. Reads must have
sample identifiers for this to work. The
simplest way to do this is usually to use the -relabel @ option of fastq_filter
For typical Illumina reads with one
pair of FASTQ files (R1 and R2) per sample.
usearch -fastq_mergepairs *_R1*.fastq -relabel @
usearch -fastq_filter reads.fq -fastq_maxee 1.0 -fastaout
usearch -fastx_uniques filtered.fa -fastaout
usearch -unoise uniques.fa -tabbedout out.txt -fastaout
usearch -usearch_global reads.fq -db denoised.fa -strand
plus -id 0.97 -otutabout otu_table.txt