long sequences

The ublast command fully supports long sequences, e.g. chromosomes or contigs, as both query and database sequences. It may be useful to choose different index options for long sequences, especially if the goal is to find high-identity nucleotide matches. If you're not sure what options would work best in your case then you are welcome to email me to discuss.

Commands based on the USEARCH algorithm, including usearch_global and usearch_local, cluster_fast and cluster_smallmem,, do not have built-in limits on sequence length. However, the word-counting heuristics and global alignments (if applicable) made by these commands tend to be ineffective or computationally expensive when the sequence length exceeds that of a typical gene, say somewhere around 50,000 letters. If you have a situation where you feel that one of these commands would be useful with longer sequences, please email me to discuss.