Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2418

NPE in org.apache.hadoop.io.Text from FetcherThread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.13
    • None
    • fetcher
    • None

    Description

      2017-09-05 15:28:54,539 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: FetcherThread 38 fetch of https://www.provinciegroningen.nl/fileadmin/user_upload/Documenten/Downloads/vanturfvntoervfol.pdf failed with: java.lang.NullPointerException
      	at org.apache.hadoop.io.Text.encode(Text.java:450)
      	at org.apache.hadoop.io.Text.encode(Text.java:431)
      	at org.apache.hadoop.io.Text.writeString(Text.java:480)
      	at org.apache.nutch.parse.ParseData.write(ParseData.java:168)
      	at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:69)
      	at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:142)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1157)
      	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
      	at org.apache.nutch.fetcher.FetcherThread.output(FetcherThread.java:773)
      	at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:360)
      

      Never seen it before, no idea what's going on. Opening issue to track it.

      More found: lots of fetches of this website throw this NPE:

      2017-09-25 13:55:08,103 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: FetcherThread 37 fetch of http://www.jabra.com.mx/c/fr/speak510-offert failed with: java.lang.NullPointerException
      	at org.apache.hadoop.io.Text.encode(Text.java:450)
      	at org.apache.hadoop.io.Text.encode(Text.java:431)
      	at org.apache.hadoop.io.Text.writeString(Text.java:480)
      	at org.apache.nutch.parse.ParseData.write(ParseData.java:168)
      	at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:69)
      	at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:142)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1157)
      	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
      	at org.apache.nutch.fetcher.FetcherThread.output(FetcherThread.java:773)
      	at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:360)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            markus17 Markus Jelsma
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: