Look at the two tables below. As the first columns stay the same in both, we have j(A,B) = 1 for the Jaccard distance in both tables. Still, the probe distance is quite different in these two tables. In the first the probabilities of having C when A or B is present are one and zero respectively, thus making the probe distance equal to one. In the second table the probabilities are the same (1/2 in both) and so the probe distance is zero.
Therefore the probe distance depends also on something else than just the Jaccard distance, which definitely isn't surprising as the Jaccard distance doesn't couldn't care less of the left-over table whereas the probe distance is more interested in similarities in functional dependencies across the whole table.
Suppose you visualize the whole context as bipartite graph of words W and sentences S, where an edge (W_i, S_j) means that the ith word occurs in the jth sentence. The adjacency matrix of this graph is exactly what is described in the exercise.
The comparison of neighbourhoods is a very common similarity measure for bipartite graphs for example in the case of clustering. Here the neighbourhood of a word is the set of sentences it occurs in and the neighbourhood of a sentence is the set of words it contains. Now we may give a similarity measure for the nodes of the graph by taking the Jaccard coefficient of the nodes' neighbourhoods.
The first suggestion would now be fulfilled by taking the columns corresponding to the words in question and calculate the Jaccard coefficient of these columns. Similarly the second suggestion can be answered by using the corresponding rows. Actually any similarity measure for sets or binary vectors can be used here in state of Jaccard measure. Jaccard measure's advantages here are its simple interpretation, general context, intuitivity and normalization.
In correlation clustering we have distances X(u,v) that should be minimized inside clusters and maximized across clusters just as one would expect. These ideas are often used in the case of simple clustering, where this kind of a distance used is for example minimal distance leading to single linkage clustering, distance between averages leading to K-means or cut size, path length or conductance, all of which lead to spectral clustering.
In clustering aggregation we are measuring distances between clusterings with the number of disagreements. Notice that the flexible component of correlation clustering is the distance that can be selected freely. Thus this distance measure should be adapted to fit the case of calculating disagreements.
Naturally the disagreements themselves should be minimized everywhere, meaning that inside clusters the variation of labels should be maximized and across clusters minimized. So let us then select the distance to be the variation in the labels against the reference clusterings, i.e. the number of clusterings giving different labels to the point-pair and finally normalize by taking the mean. This distance is also obviously limited to the unit interval as required.
If u and v belong to the same cluster, then the distance from the set of reference clusterings to the new one is the percentage of clusterings the new clustering disagrees with. The same interepretation holds in the complementary case because the new clustering disagrees with every clustering that doesn't label the points differently. Finally we just sum this all up to notice that now the two distance measures collide with each other therefore proving the claim.
You are at: CIS → T-61.5060 Exercises 2007, Solutions 7
Page maintained by firstname.lastname@example.org, last updated Tuesday, 13-Nov-2007 09:32:08 EET