HCatalog
  1. HCatalog
  2. HCATALOG-276

After merging in HCATALOG-237 related changes Pig scripts with more than one store fail

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4
    • Fix Version/s: 0.4
    • Component/s: pig
    • Labels:
      None

      Description

      e2e tests Pig_Checkin_4 and Pig_Checkin_5 are failing.

      1. HCATALOG-276.patch
        4 kB
        Mithun Radhakrishnan
      2. HCATALOG-276_reviewed.patch
        5 kB
        Mithun Radhakrishnan
      3. HCATALOG-276_-_Additionally_resetting_mapred_output_dir_for__Task()_methods_.patch
        3 kB
        Mithun Radhakrishnan
      4. HCAT-276.patch
        0.8 kB
        Alan Gates

        Issue Links

          Activity

          Hide
          Alan Gates added a comment -

          The stack trace in the map task that fails is:

          org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the store function.
          	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.setUp(POStore.java:114)
          	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:235)
          	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
          	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
          	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
          	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
          	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:396)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
          	at org.apache.hadoop.mapred.Child.main(Child.java:249)
          Caused by: java.io.IOException: The temporary job-output directory hdfs://hrt9n08.cc1.ygridcore.net/user/hive/hdp1warehouse/pig_checkin_4_1/_TEMP/_temporary doesn't exist!
          	at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
          	at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
          	at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
          	at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getRecordWriter(HiveIgnoreKeyTextOutputFormat.java:125)
          	at org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:78)
          	at org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:211)
          	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.createStoreFunc(MapReducePOStoreImpl.java:85)
          	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.setUp(POStore.java:106)
          
          Show
          Alan Gates added a comment - The stack trace in the map task that fails is: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the store function. at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.setUp(POStore.java:114) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:235) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: The temporary job-output directory hdfs: //hrt9n08.cc1.ygridcore.net/user/hive/hdp1warehouse/pig_checkin_4_1/_TEMP/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getRecordWriter(HiveIgnoreKeyTextOutputFormat.java:125) at org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:78) at org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:211) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.createStoreFunc(MapReducePOStoreImpl.java:85) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.setUp(POStore.java:106)
          Hide
          Alan Gates added a comment -

          A patch to fix this, courtesy of Daniel. When this is applied Pig_Checkin_4 and Pig_Checkin_5 pass.

          Show
          Alan Gates added a comment - A patch to fix this, courtesy of Daniel. When this is applied Pig_Checkin_4 and Pig_Checkin_5 pass.
          Hide
          Alan Gates added a comment -

          Patch checked in.

          Show
          Alan Gates added a comment - Patch checked in.
          Hide
          Ashutosh Chauhan added a comment -

          I am not sure we want to fix it this way. If this is fixing a test case then its more likely masking some other problem. setupJob() is called once by MR framework at a start of the job and is run as an independent task. So, this fix is breaking semantics in two ways

          • Calling setupJob multiple times, once in each task.
          • Running it within a getRecordWriter() call, which is not the place where it should run.
          Show
          Ashutosh Chauhan added a comment - I am not sure we want to fix it this way. If this is fixing a test case then its more likely masking some other problem. setupJob() is called once by MR framework at a start of the job and is run as an independent task. So, this fix is breaking semantics in two ways Calling setupJob multiple times, once in each task. Running it within a getRecordWriter() call, which is not the place where it should run.
          Hide
          Francis Liu added a comment -

          Reopening the ticket investigate for a better fix. setupJob() should've been called. Does this happen only on jobs that use dynamic partitioning?

          Show
          Francis Liu added a comment - Reopening the ticket investigate for a better fix. setupJob() should've been called. Does this happen only on jobs that use dynamic partitioning?
          Hide
          Mithun Radhakrishnan added a comment -

          On examining the job (launched from Pig), one sees that FileOutputCommitter is being called twice (as expected), once for each of the Storers. The problem is that both times, "mapred.output.dir" is set to the same temp-directory (corresponding to one of the stores.)

          This looks like a Pig bug to me. PigOutputCommitter should be setting the right mapred.output.dir before invoking setupJob() on the underlying committer. (Will raise bug.)

          I'll put up an HCat patch with the workaround, in a few minutes.

          Show
          Mithun Radhakrishnan added a comment - On examining the job (launched from Pig), one sees that FileOutputCommitter is being called twice (as expected), once for each of the Storers. The problem is that both times, "mapred.output.dir" is set to the same temp-directory (corresponding to one of the stores.) This looks like a Pig bug to me. PigOutputCommitter should be setting the right mapred.output.dir before invoking setupJob() on the underlying committer. (Will raise bug.) I'll put up an HCat patch with the workaround, in a few minutes.
          Hide
          Mithun Radhakrishnan added a comment -

          Rolled back Daniel's workaround in HCatOutputFormat::getRecordWriter(). Added workaround in the FileOutputCommitterContainer, to set mapred.output.dir from OutputJobInfo available at hand.

          Show
          Mithun Radhakrishnan added a comment - Rolled back Daniel's workaround in HCatOutputFormat::getRecordWriter(). Added workaround in the FileOutputCommitterContainer, to set mapred.output.dir from OutputJobInfo available at hand.
          Hide
          Alan Gates added a comment -

          I've run this with the e2e tests and confirmed they pass. Before I check it in I'd like Ashutosh and Francis to take a look to make sure the fix looks good.

          Show
          Alan Gates added a comment - I've run this with the e2e tests and confirmed they pass. Before I check it in I'd like Ashutosh and Francis to take a look to make sure the fix looks good.
          Hide
          Ashutosh Chauhan added a comment -

          OutputJobInfo's location is set in FosterStorageHandler but may not get set in other storage-handlers. In those cases, setting a null value in Configuration will result in NPE. So, I think there is a need for a null-check of location before setting it.
          Otherwise, looks good.

          Show
          Ashutosh Chauhan added a comment - OutputJobInfo's location is set in FosterStorageHandler but may not get set in other storage-handlers. In those cases, setting a null value in Configuration will result in NPE. So, I think there is a need for a null-check of location before setting it. Otherwise, looks good.
          Hide
          Mithun Radhakrishnan added a comment -

          Including Ashutosh's suggestion, to avoid NPEs.

          Show
          Mithun Radhakrishnan added a comment - Including Ashutosh's suggestion, to avoid NPEs.
          Hide
          Alan Gates added a comment -

          Checked in _reviewed patch into trunk and 0.4 branch. Thanks Mithun.

          Show
          Alan Gates added a comment - Checked in _reviewed patch into trunk and 0.4 branch. Thanks Mithun.
          Hide
          Francis Liu added a comment -

          Sorry need to reopen.

          • setupJob() does not have a null check
          • commitTask has the hack but abortTask() doesn't. It's also missing in needsTaskCommit() and setupTask()
          Show
          Francis Liu added a comment - Sorry need to reopen. setupJob() does not have a null check commitTask has the hack but abortTask() doesn't. It's also missing in needsTaskCommit() and setupTask()
          Hide
          Francis Liu added a comment -

          This hack is nasty. There might be other issues. Should we have another jira to keep track of this?

          Show
          Francis Liu added a comment - This hack is nasty. There might be other issues. Should we have another jira to keep track of this?
          Hide
          Mithun Radhakrishnan added a comment -

          Please pardon my oversight. I have an additional patch that sets output-dir in all task-methods as well.

          I've retested from scratch, although it's not a significant change. Unit-tests and the pig-integration test pass.

          (Please note that this patch applies on top of what Alan's kindly checked in already.)

          Show
          Mithun Radhakrishnan added a comment - Please pardon my oversight. I have an additional patch that sets output-dir in all task-methods as well. I've retested from scratch, although it's not a significant change. Unit-tests and the pig-integration test pass. (Please note that this patch applies on top of what Alan's kindly checked in already.)
          Hide
          Francis Liu added a comment -

          +1

          Show
          Francis Liu added a comment - +1
          Hide
          Alan Gates added a comment -

          I keep closing this jack in the box and it keeps popping back open. Latest patch (HCATALOG-276-this-is-the-longest-patch-name-ever.patch ) checked in. Let's hope this is the final one.

          Show
          Alan Gates added a comment - I keep closing this jack in the box and it keeps popping back open. Latest patch ( HCATALOG-276 -this-is-the-longest-patch-name-ever.patch ) checked in. Let's hope this is the final one.
          Hide
          Mithun Radhakrishnan added a comment -

          Much obliged, Alan. (Sorry about the patch name. :])

          Show
          Mithun Radhakrishnan added a comment - Much obliged, Alan. (Sorry about the patch name. :])
          Hide
          Alan Gates added a comment -

          Issue closed with 0.4 release.

          Show
          Alan Gates added a comment - Issue closed with 0.4 release.

            People

            • Assignee:
              Mithun Radhakrishnan
              Reporter:
              Alan Gates
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development