Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1617

IndexerMapReduce to consider latest fetchDatum

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7
    • 1.21
    • None
    • None

    Description

      IndexerMapReduce can skip not_modified or delete redirects and gone records but it only considers the first incoming fetchDatum. Instead, it should consider the last fetchDatum only based on CrawlDatum.fetchTime.

      This affect indexing of multiple segments only.

      Attachments

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: