Home Software Services About Contact usearch manual


Sample identifiers in read labels

Making an OTU table
An OTU table is made by running the usearch_global command with an appropriate output file option, e.g. otutabout. See Mapping reads to OTUs for details.

Read sequences must have sample identifiers
When you run usearch_global to make the OTU table, the FASTA file or FASTQ file containing the reads must have sample identifiers in the labels.

Sample identifier syntax
The sample name can be specified by putting sample=xxx; into the label. If sample= is not found, the sample identifier is assumed to start at the beginning of the label and continue to the first character in the label which is not alphanumeric or an underscore. Put another way, any character which is not a letter, number of underscore marks the end of the sample label. (For backwards compatibility, you also can use barcodelabel=xxx). The following labels have sample identifier S01. FASTA labels start with > at the beginning of the line, FASTQ labels start with @.

>S01.123
>S01.123;size=14;
@M00967:43:000000000-A3JHG:1:1101:18327:1699;sample=S01;

In the first and second example, the period (.) is the first non-alphanumeric character so the .123 is not part of the sample identifier.

How to get sample names into your labels
The simplest method is to use the -relabel option of fastq_mergepairs, fastq_filter, derep_fulllength or derep_prefix. If you process one file at a time, you can do something like this:

usearch -derep_fulllength reads.fastq -relabel SampleName. -fastaout derep.fa

Note the period following SampleName -- you must have a character which is not a letter, number or underscore to separate the sample name from the read number.

If -relabel @ is specified, the sample name is constructed from the FASTQ filename by truncating at the first underscore or period. With typical Illumina FASTQ filenames, this is the sample name.

Alternatively, you could write you own script to do this task.