Pig
  1. Pig
  2. PIG-2266

bug with input file joining optimization in Pig

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.10.0
    • Fix Version/s: 0.12.0
    • Component/s: impl
    • Labels:
      None

      Description

      In src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java, the function hasTooManyInputFiles instantiated a LoadFunc instance, then calls setLocation before calling setUDFContextSignature. This is inconsistent with the documentation for the LoadFunc interface (see http://pig.apache.org/docs/r0.9.0/api/org/apache/pig/LoadFunc.html#setUDFContextSignature(java.lang.String)). (We've written UDFs that assume that setUDFContextSignature is called first.)

      I think you can fix this by adding

      loader.setUDFContextSignature(ld.getSignature());

      Before

      loader.setLocation(location, job);

      1. PIG-2266.patch
        0.9 kB
        Cheolsoo Park

        Issue Links

          Activity

          Joseph Adler created issue -
          Hide
          Daniel Dai added a comment -

          That seems reasonable. Can you wrap up a patch?

          Show
          Daniel Dai added a comment - That seems reasonable. Can you wrap up a patch?
          Hide
          Joseph Adler added a comment -

          Index: MRCompiler.java
          ===================================================================
          — MRCompiler.java (revision 1165764)
          +++ MRCompiler.java (working copy)
          @@ -1353,7 +1353,8 @@
          .instantiateFuncFromSpec(ld.getLFile()
          .getFuncSpec());
          Job job = new Job(conf);

          • loader.setLocation(location, job);
            + loader.setUDFContextSignature(ld.getSignature());
            + loader.setLocation(location, job);
            InputFormat inf = loader.getInputFormat();
            List<InputSplit> splits = inf.getSplits(HadoopShims.cloneJobContext(job));
            List<List<InputSplit>> results = MapRedUtil
          Show
          Joseph Adler added a comment - Index: MRCompiler.java =================================================================== — MRCompiler.java (revision 1165764) +++ MRCompiler.java (working copy) @@ -1353,7 +1353,8 @@ .instantiateFuncFromSpec(ld.getLFile() .getFuncSpec()); Job job = new Job(conf); loader.setLocation(location, job); + loader.setUDFContextSignature(ld.getSignature()); + loader.setLocation(location, job); InputFormat inf = loader.getInputFormat(); List<InputSplit> splits = inf.getSplits(HadoopShims.cloneJobContext(job)); List<List<InputSplit>> results = MapRedUtil
          Russell Jurney made changes -
          Field Original Value New Value
          Affects Version/s 0.10 [ 12316246 ]
          Hide
          Cheolsoo Park added a comment -

          Attaching Joe's change as a patch. This is needed for PIG-3015.

          Show
          Cheolsoo Park added a comment - Attaching Joe's change as a patch. This is needed for PIG-3015 .
          Cheolsoo Park made changes -
          Attachment PIG-2266.patch [ 12566690 ]
          Cheolsoo Park made changes -
          Assignee Joseph Adler [ jadler ]
          Cheolsoo Park made changes -
          Link This issue blocks PIG-3015 [ PIG-3015 ]
          Cheolsoo Park made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Santhosh Srinivasan added a comment -

          +1 to the patch.

          Show
          Santhosh Srinivasan added a comment - +1 to the patch.
          Hide
          Cheolsoo Park added a comment -

          Thank you Santhosh for the review. I will commit it after running tests.

          Show
          Cheolsoo Park added a comment - Thank you Santhosh for the review. I will commit it after running tests.
          Hide
          Joseph Adler added a comment -

          Thanks for adding this fix!

          Show
          Joseph Adler added a comment - Thanks for adding this fix!
          Hide
          Cheolsoo Park added a comment -

          Committed to trunk. Thanks Joe!

          Show
          Cheolsoo Park added a comment - Committed to trunk. Thanks Joe!
          Cheolsoo Park made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.12 [ 12323380 ]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Joseph Adler
              Reporter:
              Joseph Adler
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development