Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2388

bin/crawl indexing only webpages containing batchID instead of all in 2.x

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 2.3
    • 2.4
    • bin
    • None

    Description

      During each iteration, after generating, fetching, parsing and updating the current batch into DB, the indexer is supposed to index the current batch too. But its indexing all currently.

      __bin_nutch index $commonOptions -D solr.server.url=$SOLRURL -all -crawlId "$CRAWL_ID"
      

      It should be like below i guess -

      __bin_nutch index $commonOptions -D solr.server.url=$SOLRURL $batchId -crawlId "$CRAWL_ID"
      

      Attachments

        Activity

          People

            kaidul Kaidul Islam
            kaidul Kaidul Islam
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified