orient command
Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

orient command

Orient nucleotide sequences in a FASTA or FASTQ file to the same strand as a database.

The -fastaout option is a FASTA filename for the oriented sequences.

The -fastqout option is a FASTQ filename for the oriented sequences (requires FASTQ input).

The -notmatched option is an output file for sequences with undetermined orientation. It will be FASTA or FASTQ, depending on the input file format.

The -tabbedout option is a tabbed text file giving the orientation of each input sequence. Fields are: query_label, strand, plus_count, minus_count. Strand is + or -.

For each input sequence, the orient command attempts to determine whether it is on the same strand as the database sequences (which are assumed to all be on the same strand), or reverse-complemented. If the latter, the sequence is reverse complemented so that the output sequences are all on the same strand.

The command uses a simple word-counting strategy by finding the strand that gives more word (k-mer) matches. If too few words match, or the number is too close on both strands, the sequence is discarded. The -tabbedout file reports the results for each sequence.

You can use any 16S database you like, doesn't really matter. Safest would be to use a large database, say Greengenes or SILVA, to make sure you have good coverage, but word-counting works well down to pretty low identities so a smaller database like the RDP training set or NCBI BLAST 16S reference should be fine.

Multithreading is supported.

Example

usearch -orient reads.fastq -db 16s.udb -fastqout reads_plus.fq