The ublast command fully supports long sequences,
e.g. chromosomes or contigs, as both query and database sequences. It may be
useful to choose different index options for
long sequences, especially if the goal is to find high-identity nucleotide
matches. If you're not sure what options would work best in your case then you
are welcome to email me to discuss.
Commands based on the USEARCH
algorithm, including usearch_global
cluster_smallmem,, do not have built-in
limits on sequence length. However, the word-counting heuristics and global
alignments (if applicable) made by these commands tend to be ineffective or
computationally expensive when the sequence length exceeds that of a typical
gene, say somewhere around 50,000 letters. If you have a situation where you
feel that one of these commands would be useful with longer sequences, please
email me to discuss.