Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2388

bin/crawl indexing only webpages containing batchID instead of all in 2.x

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 2.3
    • Fix Version/s: 2.4
    • Component/s: bin
    • Labels:
      None

      Description

      During each iteration, after generating, fetching, parsing and updating the current batch into DB, the indexer is supposed to index the current batch too. But its indexing all currently.

      __bin_nutch index $commonOptions -D solr.server.url=$SOLRURL -all -crawlId "$CRAWL_ID"
      

      It should be like below i guess -

      __bin_nutch index $commonOptions -D solr.server.url=$SOLRURL $batchId -crawlId "$CRAWL_ID"
      

        Attachments

          Activity

            People

            • Assignee:
              kaidul Kaidul Islam
              Reporter:
              kaidul Kaidul Islam
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified