Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1051

Export WebGraph node scores for solr.ExternalFileField

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.4
    • None
    • None
    • Patch Available

    Description

      The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is almost exactly what is needed for using ExternalFileField in Solr. This issue tracks the option to add to dump it in the proper format. Using EFF we can update scores without reindexing millions of documents. There's one caveat, Solr won't accept an equals-sign in the key but there's a small patch for this in SOLR-2545.

      Attachments

        1. NUTCH-1051-1.4-1.patch
          2 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: