========================================================================== hostid Identifier of the host in the hostgraph hostname Name of the host, including portname if different from the default (80). Note that there are some hosts that have more than one port open. eq_hp_mp Is the home page the page with the maximum PageRank in the host? 0=no 1=yes assortativity_hp Assortativity coefficient of the home page (degree / average degree of neighbors). Degree in this case is undirected (in_degree+out_degree) assortativity_mp Assortatitivy coefficient of the page with the maximum PageRank avgin_of_out_hp Average in-degree of out-neighbors of home page (hp) avgin_of_out_mp Average in-degree of out-neighbors of page with maximum PageRank (hp) avgout_of_in_hp Average out-degree of in-neighbors of hp avgout_of_in_mp Average out-degree of in-neighbors of mp indegree_hp Indegree of hp indegree_mp Indegree of mp neighbors_2_hp Neighbors at distance 2 of hp neighbors_2_mp Neighbors at distance 2 of mp neighbors_3_hp Neighbors at distance 3 of hp neighbors_3_mp Neighbors at distance 3 of mp neighbors_4_hp Neighbors at distance 4 of hp neighbors_4_mp Neighbors at distance 4 of mp outdegree_hp Out-degree of hp outdegree_mp Out-degree of mp pagerank_hp PageRank of hp (calculated in the doc graph with no self-loops, using a damping factor of 0.85, with 50 iterations) pagerank_mp PageRank of mp prsigma_hp Standard deviation of the PageRank of in-neighbors of hp prsigma_mp Standard deviation of the PageRank of in-neighbors of mp reciprocity_hp Fraction of out-links that are also in-links of hp. For instance, if the hp has 5 out-links, and 3 of those pages links back to the home page, the assortativity coefficient is 3/5. A page with no out-links has assortativity coefficient of 0. reciprocity_mp Fraction of out-links that are also in-links of mp siteneighbors_1_hp Number of different hosts pointing to hp, obtained by approximate algorithm (could have been done exactly, but used the approximate algorithm) siteneighbors_1_mp Number of different hosts pointing to mp siteneighbors_2_hp Number of different hosts (approx.) supporting at distance 2 the hp siteneighbors_2_mp Number of different hosts (approx.) supporting at distance 2 the mp siteneighbors_3_hp Number of different hosts (approx.) supporting at distance 3 the hp siteneighbors_3_mp Number of different hosts (approx.) supporting at distance 3 the mp siteneighbors_4_hp Number of different hosts (approx.) supporting at distance 4 the hp siteneighbors_4_mp Number of different hosts (approx.) supporting at distance 4 the mp truncatedpagerank_1_hp TruncatedPageRank using truncation distance 1, hp truncatedpagerank_1_mp TruncatedPageRank using truncation distance 1, mp truncatedpagerank_2_hp TruncatedPageRank using truncation distance 2, hp truncatedpagerank_2_mp TruncatedPageRank using truncation distance 2, mp truncatedpagerank_3_hp TruncatedPageRank using truncation distance 3, hp truncatedpagerank_3_mp TruncatedPageRank using truncation distance 3, mp truncatedpagerank_4_hp TruncatedPageRank using truncation distance 4, hp truncatedpagerank_4_mp TruncatedPageRank using truncation distance 4, mp trustrank_hp TrustRank of hp (obtained using 3,800 hosts from ODP as trusted set) -- the list is at http://www.yr-bcn.es/webspam/datasets/uk2006-features/uk-2006-05.odp_docid_sitename_3800.txt.gz NOTE: this feature can be improved by using more ODP hosts in the seed set. trustrank_mp TrustRank of mp