Video talks on 16S data analysis posted.

URMAP ultra-fast read mapper posted (paper).

~20% of taxonomy annotations in SILVA and Greengenes are wrong (paper).

Taxonomy prediction is <50% accurate for 16S V4 sequences (paper).

97% OTU threshold is wrong for species, should be 99% for full-length 16S, 100% V4 (paper).

sortbysize command

See also
Abundance sort
UCLUST sort order
Global trimming

Sort sequences by decreasing size annotation, which usually refers to the size of a cluster. The size is specified by a field size=N; in the sequence label, where N is an integer. The -minsize and -maxsize options can be used to specify a minimum and maximum size, respectively.

Output is written to a FASTA file using the -fastaout option and/or a FASTQ file using the -fastqout option.

For most applications, it is recommended that sequences are globally trimmed before clustering and abundance sorting.

The -relabel prefix option can be used to generate sequential labels for the sorted sequences. The output label is prefixN where N=1, 2, 3 etc. If -sizeout is used, a size annotation will be appended to the sequential label.

The -topn N option specifies that no more than N sequences should be output.

Example

usearch -sortbysize seqs.fasta -fastaout seqs_sorted.fasta -minsize 4