Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-539

Use of TupleWritable.setConf fails in mapper/reducer

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      In (at least) more recent versions of Hadoop 2, the implicit call to TupleWritable.setConf that happens when using TupleWritables fails with a ClassNotFoundException for (ironically) the TupleWritable class.

      This appears to be due to the way that ObjectInputStream resolves classes in its resolveClass method, together with the way that the context classloader is set within a hadoop mapper or reducer.

      This is similar to PIG-2532.

      This can be reproduced in the local job tracker (at least) in Hadoop 2.7.0, but it can't be reproduced in Crunch integration tests (due to classloading setup). It appears that this issue is only present in Crunch 0.12.

      The following code within a simple pipeline will cause this issue to occur:

      PTable<String, Integer> yearTemperatures = ... /* Writable-based PTable */
      PTable<String, Integer> maxTemps = yearTemperatures
                      .groupByKey()
                      .combineValues(Aggregators.MAX_INTS())
                      .top(1);   //LINE THAT CAUSES THE ERROR
      

        Attachments

        1. CRUNCH-539.patch
          5 kB
          Gabriel Reid

          Activity

            People

            • Assignee:
              gabriel.reid Gabriel Reid
              Reporter:
              gabriel.reid Gabriel Reid
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: