Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1895

run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 2.2.1
    • None
    • crawldb, indexer
    • None
    • Win7, Solr4.10.1

    • Patch Available

    Description

      I am using Nutch 2.2.1 and Solr 4.10.1.
      OS: Win7.
      Env: MyEclipse 10.
      JAVA: jdk1.7.0_71
      I am using command:
      urls -depth 3 -topN 10 -solr http://localhost:8080/solr/collection2
      to import data to Solr.
      and using:
      gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
      gora.sqlstore.jdbc.url=jdbc:mysql://192.168.0.69:3306/nutch?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf8&autoReconnect=true&zeroDateTimeBehavior=convertToNull
      gora.sqlstore.jdbc.user=root
      gora.sqlstore.jdbc.password=123456
      to import data to mysql.

      But I got null pointer exception on batchId, then I found:

      In SolrIndexerJob.java, we need to get batchId from args:

      @Override
      public Map<String,Object> run(Map<String,Object> args) throws Exception

      { String solrUrl = (String)args.get(Nutch.ARG_SOLR); String batchId = (String)args.get(Nutch.ARG_BATCH); NutchIndexWriterFactory.addClassToConf(getConf(), SolrWriter.class); getConf().set(SolrConstants.SERVER_URL, solrUrl); currentJob = createIndexJob(getConf(), "solr-index", batchId); currentJob.waitForCompletion(true); ToolUtil.recordJobStatus(null, currentJob, results); return results; }

      But in Crawler.java, we did not put batchid in argMap:

      @Override
      public int run(String[] args) throws Exception {
      if (args.length == 0)

      { System.out.println("Usage: Crawler (<seedDir> | -continue) [-solr <solrURL>] [-threads n] [-depth i] [-topN N] [-numTasks N]"); return -1; }

      ...

      Map<String,Object> argMap = ToolUtil.toArgMap(
      Nutch.ARG_THREADS, threads,
      Nutch.ARG_DEPTH, depth,
      Nutch.ARG_TOPN, topN,
      Nutch.ARG_SOLR, solrUrl,
      Nutch.ARG_SEEDDIR, seedDir,
      Nutch.ARG_NUMTASKS, numTasks);
      run(argMap);
      return 0;
      }

      Attachments

        Activity

          People

            Unassigned Unassigned
            FeiTian FeiTian
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified