cluster_edges command
The cluster_edges command reports the connected components of a graph.
The graph is specified by a tabbed text file with one line for each edge. The line has two fields, which are the names (labels) of the nodes of the edge.
A singleton node, i.e. a node without any edges, should be specified by a line that contains the node name twice (self-edge), otherwise the node will not appear in the output. The graph is undirected , so the order the labels are specified in an edge is not significant.
Output is written to files specified by the -ccout and/or -ccnodesout options.
The ccout file is a tabbed text file with one line per connected component (cluster). There are a variable number of fields, and lines will be very long for large clusters. The first field is an integer cluster number 0, 1 ... (N-1) where N is the number of clusters, the second field is an integer giving the number of nodes in the cluster, and the remaining fields are node names.
The ccnodesout file is a tabbed text file with one line per node. Each line has two fields: the integer cluster number 0, 1 ... (N-1) where N is the number of clusters, and the node name.
Example
usearch -cluster_edges edges.txt -ccnodesout nodes.txt -ccout ccs.txt