Abstract. Suppose we are given a large graph in which, by an external process, a handful of nodes are marked. What can we say about these nodes? Are they close together in the graph? or, if segregated, how many groups do they form? We approach this problem by trying to find sets of simple connection pathways between sets of marked nodes.
We formalize the problem in terms of the Minimum Description Length principle: a pathway is simple when we need only few bits to tell which edges to follow, such that we visit all nodes in a group. Then, the best partitioning is the one that requires the least number of bits to describe the paths that visit all the marked nodes.
We prove that solving this problem is NPhard, and introduce dot2dot, an efficient algorithm for partitioning marked nodes by finding simple pathways between nodes. Experimentation shows that dot2dot correctly groups nodes for which good connection paths can be constructed, while separating distant nodes.
Mining Connection Pathways for Marked Nodes in Large Graphs. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 3745, SIAM, 2013. (oral presentation, 14.4% acceptance rate; overal 25%) 

Islands and Bridges: Making Sense of Marked Nodes in Large Graphs. Technical Report CMUCS12124R, Carnegie Mellon University, 2013. 

TourViz: Interactive Visualization of Connection Pathways in Large Graphs. Demo at, and included in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 15161519, ACM, 2012. 