Home About Contact     
 

propagating cluster sizes


In some applications, sequences are clustered in two or two or more passes by different USEARCH commands and/or by other programs. Sometimes, the size of a cluster is required in terms of the number of sequences that were provided to the first stage of a pipeline. For example, 16S reads might dereplicated then clustered into OTUs by cluister_otus .

To handle multi-step clustering, USEARCH provides a mechanism to propagate cluster size annotations . If the -sizein option is specified, input sequences are required to have a size annotation. If the -sizeout option is specified, size annotations are added to the output labels. If both -sizein and -sizeout are given, then the output size for a cluster takes into account the input sizes.

Typical use is:

1. First clustering or dereplication step in the pipeline uses -sizeout.

2. Subsequent clustering steps use both -sizein and -sizeout.

If another program is used before the first USEARCH step, then it is up to you to write scripts to produce size annotations for USEARCH.

1sco
Search the AlphaFold DB online in seconds >