Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-897

Subcollection requires blacklist element

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • 1.2, 1.3, nutchgora
    • 1.3, nutchgora
    • indexer
    • None
    • Patch Available

    Description

      This is a very minor issue with in Subcollection.java. It throws an error if the (empty) blacklist element was omitted. I think it should either not silently fail in case of an omitted blacklist element or throw a decent error message that the blacklist element is required. The following exception gets thrown if the blacklist element is omitted in a subcollection block:

      2010-09-06 13:32:30,438 INFO collection.CollectionManager - Instantiating CollectionManager
      2010-09-06 13:32:30,438 INFO collection.CollectionManager - initializing CollectionManager
      2010-09-06 13:32:30,451 INFO collection.CollectionManager - file has1 elements
      2010-09-06 13:32:30,456 WARN collection.CollectionManager - Error occured:java.lang.NullPointerException
      2010-09-06 13:32:30,469 WARN collection.CollectionManager - java.lang.NullPointerException
      2010-09-06 13:32:30,470 WARN collection.CollectionManager - at org.apache.nutch.collection.Subcollection.initialize(Subcollection.java:173)
      2010-09-06 13:32:30,470 WARN collection.CollectionManager - at org.apache.nutch.collection.CollectionManager.parse(CollectionManager.java:98)
      2010-09-06 13:32:30,470 WARN collection.CollectionManager - at org.apache.nutch.collection.CollectionManager.init(CollectionManager.java:75)
      2010-09-06 13:32:30,470 WARN collection.CollectionManager - at org.apache.nutch.collection.CollectionManager.<init>(CollectionManager.java:56)
      2010-09-06 13:32:30,471 WARN collection.CollectionManager - at org.apache.nutch.collection.CollectionManager.getCollectionManager(CollectionManager.java:115)
      2010-09-06 13:32:30,471 WARN collection.CollectionManager - at org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter.addSubCollectionField(SubcollectionIndexingFilter.java:65)
      2010-09-06 13:32:30,471 WARN collection.CollectionManager - at org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter.filter(SubcollectionIndexingFilter.java:71)
      2010-09-06 13:32:30,471 WARN collection.CollectionManager - at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
      2010-09-06 13:32:30,471 WARN collection.CollectionManager - at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:134)
      2010-09-06 13:32:30,472 WARN collection.CollectionManager - at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
      2010-09-06 13:32:30,472 WARN collection.CollectionManager - at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
      2010-09-06 13:32:30,472 WARN collection.CollectionManager - at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
      2010-09-06 13:32:30,472 WARN collection.CollectionManager - at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

      Attachments

        1. NUTCH-897-1.patch
          1 kB
          Markus Jelsma
        2. NUTCH-897.patch
          1 kB
          Markus Jelsma

        Activity

          People

            markus17 Markus Jelsma
            markus17 Markus Jelsma
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: