Pig
  1. Pig
  2. PIG-179

On hadoop 0.16, some jobs using combiner fail with an NPE

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.0.0
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Hadoop 0.16

      Description

      Some jobs (it appears to only be larger jobs) now fail with an NPE in the combiner code on this line:

      PigSplit split = PigInputFormat.PigRecordReader.getPigRecordReader().getPigFileSplit();
      

      Looking into the PigRecordReader a comment in the class indicates that, as implemented, it depends on the mapper and splitter (and in this case the combiner as well) running in the same thread. It seems that in some cases in hadoop 0.16 this is no longer the case.

        Activity

        Hide
        Alan Gates added a comment -

        Patch regenerated, tested, and checked in at revision 647166.

        Show
        Alan Gates added a comment - Patch regenerated, tested, and checked in at revision 647166.
        Hide
        Alan Gates added a comment -

        This patch conflicts with the fix for PIG-55. I'll regenerate it and attach the new patch.

        Show
        Alan Gates added a comment - This patch conflicts with the fix for PIG-55 . I'll regenerate it and attach the new patch.
        Hide
        Alan Gates added a comment -

        Fix checked in.

        Show
        Alan Gates added a comment - Fix checked in.
        Hide
        Olga Natkovich added a comment -

        Looks good. +1

        Show
        Olga Natkovich added a comment - Looks good. +1
        Hide
        Alan Gates added a comment -

        A patch that removes the ThreadLocal modifier for the PigRecordReader. According to Ben Reed (who wrote this) he originally made it thread local because he was concerned that hadoop might change to run multiple maps in the same JVM. As that does not now seem likely converting this ThreadLocal to static will be safe and not cause and NPE in cases where the RecordReader, Mapper, and Combiner aren't all running in the same thread.

        Show
        Alan Gates added a comment - A patch that removes the ThreadLocal modifier for the PigRecordReader. According to Ben Reed (who wrote this) he originally made it thread local because he was concerned that hadoop might change to run multiple maps in the same JVM. As that does not now seem likely converting this ThreadLocal to static will be safe and not cause and NPE in cases where the RecordReader, Mapper, and Combiner aren't all running in the same thread.

          People

          • Assignee:
            Alan Gates
            Reporter:
            Alan Gates
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development