Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-655

Injecting Crawl metadata

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.1
    • injector
    • None
    • Patch Available

    Description

      the patch attached allows to inject metadata into the crawlDB. The input file has to contain fields separated by tabs, with the URL being on the first column. The metadata names and values are separated by '='. A input line might look like this:
      http://www.myurl.com \t categ=value1 \t categ2=value2

      This functionality can be useful to store external knowledge and index it with a custom plugin

      Attachments

        1. Injector.patch
          2 kB
          Julien Nioche
        2. NUTCH-655.v2
          3 kB
          Julien Nioche

        Issue Links

          Activity

            People

              jnioche Julien Nioche
              jnioche Julien Nioche
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: