Commands > Clustering
The cluster_edges command reports the
connected components of a graph.
The graph is specified by a tabbed text file with one line
for each edge. The line has two fields, which are the names (labels) of the
nodes of the edge.
A singleton node, i.e. a node without any edges, should be
specified by a line that contains the node name twice (self-edge), otherwise the
node will not appear in the output. The graph is
undirected, so the order the labels are specified in an edge is not
Output is written to files specified by the -ccout and/or
The ccout file is a tabbed text file with one line per
connected component (cluster). There are a variable number of fields, and lines
will be very long for large clusters. The first field is an integer cluster
number 0, 1 ... (N-1) where N is the number of clusters, the second field is an
integer giving the number of nodes in the cluster, and the remaining fields are
The ccnodesout file is a tabbed text file with one line
per node. Each line has two fields: the integer cluster number 0, 1 ... (N-1)
where N is the number of clusters, and the node name.
usearch -cluster_edges edges.txt -ccnodesout nodes.txt -ccout