Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2184

Enable IndexingJob to function with no crawldb

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.17
    • indexer
    • None
    • Patch Available
    • Patch

    Description

      Sometimes when working with distributed team(s), we have found that we can 'loose' data structures which are currently considered as critical e.g. crawldb, linkdb and/or segments.
      In my current scenario I have a requirement to index segment data with no accompanying crawldb or linkdb.
      Absence of the latter is OK as linkdb is optional however currently in IndexerMapReduce crawldb is mandatory.
      This ticket should enhance the IndexerMapReduce code to support the use case where you ONLY have segments and want to force an index for every record present.

      Attachments

        1. NUTCH-2184v2.patch
          17 kB
          Lewis John McGibbney
        2. NUTCH-2184.patch
          17 kB
          Lewis John McGibbney

        Issue Links

          Activity

            People

              lewismc Lewis John McGibbney
              lewismc Lewis John McGibbney
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: