It was possible to compute content-based features only for 8,944 hosts. ======================================================================================= hostid Identifier of the host in the hostgraph hostname Name of the host, including portname if different from the default (80). Note that there are some hosts that have more than one port open HST_1 Number of words in the page (home page = hp) HST_2 Number of words in the title (hp) HST_3 Average word length (hp) HST_4 Fraction of anchor text (hp) HST_5 Fraction of visible text (hp) HST_6 Compression rate of the hp HST_7 Top 100 corpus precision (hp) HST_8 Top 200 corpus precision (hp) HST_9 Top 500 corpus precision (hp) HST_10 Top 1000 corpus precision (hp) HST_11 Top 100 corpus recall (hp) HST_12 Top 200 corpus recall (hp) HST_13 Top 500 corpus recall (hp) HST_14 Top 1000 corpus recall (hp) HST_15 Top 100 queries precision (hp) HST_16 Top 200 queries precision (hp) HST_17 Top 500 queries precision (hp) HST_18 Top 1000 queries precision (hp) HST_19 Top 100 queries recall (hp) HST_20 Top 200 queries recall (hp) HST_21 Top 500 queries recall (hp) HST_22 Top 1000 queries recall (hp) HST_23 Entropy (hp) HST_24 Independent LH (hp) HMG_25 Number of words in the page (page with max PageRank in the host = mp) HMG_26 Number of words in the title (mp) HMG_27 Average word length (mp) HMG_28 Fraction of anchor text (mp) HMG_29 Fraction of visible text (mp) HMG_30 Compression rate (mp) HMG_31 Top 100 corpus precision (mp) HMG_32 Top 200 corpus precision (mp) HMG_33 Top 500 corpus precision (mp) HMG_34 Top 1000 corpus precision (mp) HMG_35 Top 100 corpus recall (mp) HMG_36 Top 200 corpus recall (mp) HMG_37 Top 500 corpus recall (mp) HMG_38 Top 1000 corpus recall (mp) HMG_39 Top 100 queries precision (mp) HMG_40 Top 200 queries precision (mp) HMG_41 Top 500 queries precision (mp) HMG_42 Top 1000 queries precision (mp) HMG_43 Top 100 queries recall (mp) HMG_44 Top 200 queries recall (mp) HMG_45 Top 500 queries recall (mp) HMG_46 Top 1000 queries recall (mp) HMG_47 Entropy (mp) HMG_48 Independent LH (mp) AVG_49 Number of words in the page (average value for all pages in the host) AVG_50 Number of words in the title (average value for all pages in the host) AVG_51 Average word length (average value for all pages in the host) AVG_52 Fraction of anchor text (average value for all pages in the host) AVG_53 Fraction of visible text (average value for all pages in the host) AVG_54 Compression rate (average value for all pages in the host) AVG_55 Top 100 corpus precision (average value for all pages in the host) AVG_56 Top 200 corpus precision (average value for all pages in the host) AVG_57 Top 500 corpus precision (average value for all pages in the host) AVG_58 Top 1000 corpus precision (average value for all pages in the host) AVG_59 Top 100 corpus recall (average value for all pages in the host) AVG_60 Top 200 corpus recall (average value for all pages in the host) AVG_61 Top 500 corpus recall (average value for all pages in the host) AVG_62 Top 1000 corpus recall (average value for all pages in the host) AVG_63 Top 100 queries precision (average value for all pages in the host) AVG_64 Top 200 queries precision (average value for all pages in the host) AVG_65 Top 500 queries precision (average value for all pages in the host) AVG_66 Top 1000 queries precision (average value for all pages in the host) AVG_67 Top 100 queries recall (average value for all pages in the host) AVG_68 Top 200 queries recall (average value for all pages in the host) AVG_69 Top 500 queries recall (average value for all pages in the host) AVG_70 Top 1000 queries recall (average value for all pages in the host) AVG_71 Entropy (average value for all pages in the host) AVG_72 Independent LH (average value for all pages in the host) STD_73 Number of words in the page (Standard deviation for all pages in the host) STD_74 Number of words in the title (Standard deviation for all pages in the host) STD_75 Average word length (Standard deviation for all pages in the host) STD_76 Fraction of anchor text (Standard deviation for all pages in the host) STD_77 Fraction of visible text (Standard deviation for all pages in the host) STD_78 Compression rate in the home page (Standard deviation for all pages in the host) STD_79 Top 100 corpus precision (Standard deviation for all pages in the host) STD_80 Top 200 corpus precision (Standard deviation for all pages in the host) STD_81 Top 500 corpus precision (Standard deviation for all pages in the host) STD_82 Top 1000 corpus precision (Standard deviation for all pages in the host) STD_83 Top 100 corpus recall (Standard deviation for all pages in the host) STD_84 Top 200 corpus recall (Standard deviation for all pages in the host) STD_85 Top 500 corpus recall (Standard deviation for all pages in the host) STD_86 Top 1000 corpus recall (Standard deviation for all pages in the host) STD_87 Top 100 queries precision (Standard deviation for all pages in the host) STD_88 Top 200 queries precision (Standard deviation for all pages in the host) STD_89 Top 500 queries precision (Standard deviation for all pages in the host) STD_90 Top 1000 queries precision (Standard deviation for all pages in the host) STD_91 Top 100 queries recall (Standard deviation for all pages in the host) STD_92 Top 200 queries recall (Standard deviation for all pages in the host) STD_93 Top 500 queries recall (Standard deviation for all pages in the host) STD_94 Top 1000 queries recall (Standard deviation for all pages in the host) STD_95 Entropy (Standard deviation for all pages in the host) STD_96 Independent LH (Standard deviation for all pages in the host)