Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1675

NutchField to support long

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.8
    • indexer
    • None
    • Patch Available

    Description

      NutchField has no support for Long in readfields. Usually this is not a problem because in reducers it is only written to the output. But when using NutchField in mappers, then a reducer cannot read a Long.

      java.lang.RuntimeException: problem advancing post rec#0
              at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1217)
              at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:250)
              at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:246)
              at org.apache.nutch.fetcher.Fetcher$FetcherReducer.reduce(Fetcher.java:1440)
              at org.apache.nutch.fetcher.Fetcher$FetcherReducer.reduce(Fetcher.java:1401)
              at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
              at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
              at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
      Caused by: java.io.EOFException
              at java.io.DataInputStream.readFully(DataInputStream.java:197)
              at org.apache.hadoop.io.Text.readString(Text.java:402)
              at org.apache.nutch.indexer.NutchField.readFields(NutchField.java:89)
              at org.apache.nutch.indexer.NutchDocument.readFields(NutchDocument.java:112)
              at org.apache.nutch.indexer.NutchIndexAction.readFields(NutchIndexAction.java:81)
              at org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:54)
              at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
              at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
              at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1276)
              at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)
              ... 7 more
      

      Attachments

        1. NUTCH-1675-trunk.patch
          0.6 kB
          Markus Jelsma

        Activity

          People

            markus17 Markus Jelsma
            markus17 Markus Jelsma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: