Home Software Services About Contact usearch manual
cluster_edges command

The cluster_edges command reports the connected components of a graph.

The graph is specified by a tabbed text file with one line for each edge. The line has two fields, which are the names (labels) of the nodes of the edge.

A singleton node, i.e. a node without any edges, should be specified by a line that contains the node name twice (self-edge), otherwise the node will not appear in the output. The graph is undirected, so the order the labels are specified in an edge is not significant.

Output is written to files specified by the -ccout and/or -ccnodesout options.

The ccout file is a tabbed text file with one line per connected component (cluster). There are a variable number of fields, and lines will be very long for large clusters. The first field is an integer cluster number 0, 1 ... (N-1) where N is the number of clusters, the second field is an integer giving the number of nodes in the cluster, and the remaining fields are node names.

The ccnodesout file is a tabbed text file with one line per node. Each line has two fields: the integer cluster number 0, 1 ... (N-1) where N is the number of clusters, and the node name.

Example

usearch -cluster_edges edges.txt -ccnodesout nodes.txt -ccout ccs.txt