Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1905

Nutch index tool should be resilient to segments that don't have crawl_* data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 1.11
    • indexer
    • None

    Description

      When running the ./bin/nutch index command with the -dir <path/to/segment/dir> I noticed that if you have a segment directory that doesn't include crawl_* or parse_* data, that the indexer fails (correctly). However, the indexer should be more resilient in those cases - we can add a simple check to see if those dirs are present in the segment, and proceed if they are, otherwise, ignore that segment and print a message (and go to the other segments).

      Attachments

        Issue Links

          Activity

            People

              chrismattmann Chris A. Mattmann
              chrismattmann Chris A. Mattmann
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: