Home Software Services About Contact usearch manual


FASTA reads (no qual)

See also
 
UPARSE home page
  UPARSE pipeline home page

This page gives an example UPARSE pipeline for reads in FASTA format. You may need to do this if you have legacy reads that have already undergone some processing and the original reads with quality scores (FASTQ or SEQ+QUAL) are no longer available. It is better to start from the unprocessed original reads if you can find them (because USEARCH quality filtering is better than previous methods).

You need to add sample names to the read labels. For this example, I'll assume that you've made a FASTA file called reads.fa with non-biological sequences removed (e.g. barcodes) and with sample names added. I can't tell you exactly how to do this because there are too many variations in read structure out there. Probably you'll need to write a script or two.

I've used a trim length of 150 in this example (-trunclen option of fastx_truncate), which is a pretty arbitrary choice. You should choose a trim length as a compromise between discarding reads that are shorter and keeping as many bases as possible to increase phylogenetic resolution. If you already have fixed-length reads (e.g., old Illumina data) then you may not need the trimming step.

Commands
usearch -fastx_trunclen reads.fa -trunclen 150 -fastaout trimmed.fa

usearch -derep_fulllength trimmed.fa -sizeout -fastaout uniques.fa

usearch -cluster_otus uniques.fa -minsize 2 -otus otus.fa -relabel Otu

usearch -usearch_global trimmed.fa -db otus.fa -strand plus -id 0.97 \
  -otutabout otutab.txt -biomout otutab.json