Home About Contact     
 

cluster_edges command

The cluster_edges command reports the connected components of a graph.

The graph is specified by a tabbed text file with one line for each edge. The line has two fields, which are the names (labels) of the nodes of the edge.

A singleton node, i.e. a node without any edges, should be specified by a line that contains the node name twice (self-edge), otherwise the node will not appear in the output. The graph is undirected , so the order the labels are specified in an edge is not significant.

Output is written to files specified by the -ccout and/or -ccnodesout options.

The ccout file is a tabbed text file with one line per connected component (cluster). There are a variable number of fields, and lines will be very long for large clusters. The first field is an integer cluster number 0, 1 ... (N-1) where N is the number of clusters, the second field is an integer giving the number of nodes in the cluster, and the remaining fields are node names.

The ccnodesout file is a tabbed text file with one line per node. Each line has two fields: the integer cluster number 0, 1 ... (N-1) where N is the number of clusters, and the node name.

Example

usearch -cluster_edges edges.txt -ccnodesout nodes.txt -ccout ccs.txt

1sco
Search the AlphaFold DB online in seconds >