Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1884

NullPointerException in parsechecker and indexchecker with symlinks in file URL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.9
    • 1.10
    • indexer, parser
    • None
    • Mac OS X 10.9.2
      Apache Maven 2.2.1
      Java version: 1.7.0_51

    • Patch Available

    Description

      I have downloaded the Nutch source code from github (https://github.com/apache/nutch), applied the patches (NUTCH-1879 and NUTCH-1880), and then reinstalled the Nutch. Now the good news is that all urls contain only 1 slash. But unfortunately, the java.lang.NullPointerException warning/error still exists for both of the parsechecker and indexchecker commands.

      Below is the running log:

      (1) $ ./nutch parsechecker "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
      fetching: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
      parsing: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
      contentType: text/html
      signature: 17bdb44990391c96bb8d48d1802ff11c
      Couldn't pass score, url file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ (java.lang.NullPointerException)
      ---------
      Url
      ---------------

      file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
      ---------
      ParseData
      ---------

      Version: 5
      Status: success(1,0)
      Title: Index of /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
      Outlinks: 2
      outlink: toUrl: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/ anchor: ../
      outlink: toUrl: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml anchor: monitor.xml
      Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue, 14 Oct 2014 20:05:50 GMT Content-Type=text/html
      Parse Metadata: CharEncodingForConversion=windows-1252 OriginalCharEncoding=windows-1252

      (2) $ ./nutch indexchecker "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
      fetching: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
      parsing: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
      contentType: text/html
      Exception in thread "main" java.lang.NullPointerException
      at org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)

      Attachments

        1. NUTCH-1884-trunk-v1.patch
          4 kB
          Sebastian Nagel

        Activity

          People

            Unassigned Unassigned
            angela_wang Angela Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: