Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1616

SegmentMerger missing proper crawl_fetch datum

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 1.7
    • 1.8
    • None
    • None

    Description

      Merged 26036 vs. unmerged 26038 indexed documents! There are two records on the merged segment that no longer have a crawl_fetch CrawlDatum with a fetch_success status. Instead, the only crawl_fetch CrawlDatum has status linked!

      The original segment two crawl_fetch CrawlDatums with linked and the fetch_success status.

      Without the fetch_success of not_modified status it is not going to be indexed.

      Attachments

        1. NUTCH-1616-1.8.patch
          2 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: