HCatalog
  1. HCatalog
  2. HCATALOG-276

After merging in HCATALOG-237 related changes Pig scripts with more than one store fail

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4
    • Fix Version/s: 0.4
    • Component/s: pig
    • Labels:
      None

      Description

      e2e tests Pig_Checkin_4 and Pig_Checkin_5 are failing.

      1. HCAT-276.patch
        0.8 kB
        Alan Gates
      2. HCATALOG-276.patch
        4 kB
        Mithun Radhakrishnan
      3. HCATALOG-276_reviewed.patch
        5 kB
        Mithun Radhakrishnan
      4. HCATALOG-276_-_Additionally_resetting_mapred_output_dir_for__Task()_methods_.patch
        3 kB
        Mithun Radhakrishnan

        Issue Links

          Activity

          Alan Gates created issue -
          Hide
          Alan Gates added a comment -

          The stack trace in the map task that fails is:

          org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the store function.
          	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.setUp(POStore.java:114)
          	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:235)
          	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
          	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
          	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
          	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
          	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:396)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
          	at org.apache.hadoop.mapred.Child.main(Child.java:249)
          Caused by: java.io.IOException: The temporary job-output directory hdfs://hrt9n08.cc1.ygridcore.net/user/hive/hdp1warehouse/pig_checkin_4_1/_TEMP/_temporary doesn't exist!
          	at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
          	at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
          	at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
          	at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getRecordWriter(HiveIgnoreKeyTextOutputFormat.java:125)
          	at org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:78)
          	at org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:211)
          	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.createStoreFunc(MapReducePOStoreImpl.java:85)
          	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.setUp(POStore.java:106)
          
          Show
          Alan Gates added a comment - The stack trace in the map task that fails is: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the store function. at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.setUp(POStore.java:114) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:235) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: The temporary job-output directory hdfs: //hrt9n08.cc1.ygridcore.net/user/hive/hdp1warehouse/pig_checkin_4_1/_TEMP/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getRecordWriter(HiveIgnoreKeyTextOutputFormat.java:125) at org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:78) at org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:211) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.createStoreFunc(MapReducePOStoreImpl.java:85) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.setUp(POStore.java:106)
          Alan Gates made changes -
          Field Original Value New Value
          Link This issue is broken by HCATALOG-237 [ HCATALOG-237 ]
          Hide
          Alan Gates added a comment -

          A patch to fix this, courtesy of Daniel. When this is applied Pig_Checkin_4 and Pig_Checkin_5 pass.

          Show
          Alan Gates added a comment - A patch to fix this, courtesy of Daniel. When this is applied Pig_Checkin_4 and Pig_Checkin_5 pass.
          Alan Gates made changes -
          Attachment HCAT-276.patch [ 12516074 ]
          Alan Gates made changes -
          Assignee Daniel Dai [ daijy ]
          Hide
          Alan Gates added a comment -

          Patch checked in.

          Show
          Alan Gates added a comment - Patch checked in.
          Alan Gates made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Ashutosh Chauhan added a comment -

          I am not sure we want to fix it this way. If this is fixing a test case then its more likely masking some other problem. setupJob() is called once by MR framework at a start of the job and is run as an independent task. So, this fix is breaking semantics in two ways

          • Calling setupJob multiple times, once in each task.
          • Running it within a getRecordWriter() call, which is not the place where it should run.
          Show
          Ashutosh Chauhan added a comment - I am not sure we want to fix it this way. If this is fixing a test case then its more likely masking some other problem. setupJob() is called once by MR framework at a start of the job and is run as an independent task. So, this fix is breaking semantics in two ways Calling setupJob multiple times, once in each task. Running it within a getRecordWriter() call, which is not the place where it should run.
          Hide
          Francis Liu added a comment -

          Reopening the ticket investigate for a better fix. setupJob() should've been called. Does this happen only on jobs that use dynamic partitioning?

          Show
          Francis Liu added a comment - Reopening the ticket investigate for a better fix. setupJob() should've been called. Does this happen only on jobs that use dynamic partitioning?
          Francis Liu made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Assignee Daniel Dai [ daijy ] Francis Liu [ toffer ]
          Mithun Radhakrishnan made changes -
          Assignee Francis Liu [ toffer ] Mithun Radhakrishnan [ mithun ]
          Hide
          Mithun Radhakrishnan added a comment -

          On examining the job (launched from Pig), one sees that FileOutputCommitter is being called twice (as expected), once for each of the Storers. The problem is that both times, "mapred.output.dir" is set to the same temp-directory (corresponding to one of the stores.)

          This looks like a Pig bug to me. PigOutputCommitter should be setting the right mapred.output.dir before invoking setupJob() on the underlying committer. (Will raise bug.)

          I'll put up an HCat patch with the workaround, in a few minutes.

          Show
          Mithun Radhakrishnan added a comment - On examining the job (launched from Pig), one sees that FileOutputCommitter is being called twice (as expected), once for each of the Storers. The problem is that both times, "mapred.output.dir" is set to the same temp-directory (corresponding to one of the stores.) This looks like a Pig bug to me. PigOutputCommitter should be setting the right mapred.output.dir before invoking setupJob() on the underlying committer. (Will raise bug.) I'll put up an HCat patch with the workaround, in a few minutes.
          Hide
          Mithun Radhakrishnan added a comment -

          Rolled back Daniel's workaround in HCatOutputFormat::getRecordWriter(). Added workaround in the FileOutputCommitterContainer, to set mapred.output.dir from OutputJobInfo available at hand.

          Show
          Mithun Radhakrishnan added a comment - Rolled back Daniel's workaround in HCatOutputFormat::getRecordWriter(). Added workaround in the FileOutputCommitterContainer, to set mapred.output.dir from OutputJobInfo available at hand.
          Mithun Radhakrishnan made changes -
          Attachment HCATALOG-276.patch [ 12517831 ]
          Mithun Radhakrishnan made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Mithun Radhakrishnan made changes -
          Link This issue is blocked by PIG-2578 [ PIG-2578 ]
          Hide
          Alan Gates added a comment -

          I've run this with the e2e tests and confirmed they pass. Before I check it in I'd like Ashutosh and Francis to take a look to make sure the fix looks good.

          Show
          Alan Gates added a comment - I've run this with the e2e tests and confirmed they pass. Before I check it in I'd like Ashutosh and Francis to take a look to make sure the fix looks good.
          Hide
          Ashutosh Chauhan added a comment -

          OutputJobInfo's location is set in FosterStorageHandler but may not get set in other storage-handlers. In those cases, setting a null value in Configuration will result in NPE. So, I think there is a need for a null-check of location before setting it.
          Otherwise, looks good.

          Show
          Ashutosh Chauhan added a comment - OutputJobInfo's location is set in FosterStorageHandler but may not get set in other storage-handlers. In those cases, setting a null value in Configuration will result in NPE. So, I think there is a need for a null-check of location before setting it. Otherwise, looks good.
          Hide
          Mithun Radhakrishnan added a comment -

          Including Ashutosh's suggestion, to avoid NPEs.

          Show
          Mithun Radhakrishnan added a comment - Including Ashutosh's suggestion, to avoid NPEs.
          Mithun Radhakrishnan made changes -
          Attachment HCATALOG-276_reviewed.patch [ 12518063 ]
          Hide
          Alan Gates added a comment -

          Checked in _reviewed patch into trunk and 0.4 branch. Thanks Mithun.

          Show
          Alan Gates added a comment - Checked in _reviewed patch into trunk and 0.4 branch. Thanks Mithun.
          Alan Gates made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Francis Liu added a comment -

          Sorry need to reopen.

          • setupJob() does not have a null check
          • commitTask has the hack but abortTask() doesn't. It's also missing in needsTaskCommit() and setupTask()
          Show
          Francis Liu added a comment - Sorry need to reopen. setupJob() does not have a null check commitTask has the hack but abortTask() doesn't. It's also missing in needsTaskCommit() and setupTask()
          Francis Liu made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Francis Liu added a comment -

          This hack is nasty. There might be other issues. Should we have another jira to keep track of this?

          Show
          Francis Liu added a comment - This hack is nasty. There might be other issues. Should we have another jira to keep track of this?
          Hide
          Mithun Radhakrishnan added a comment -

          Please pardon my oversight. I have an additional patch that sets output-dir in all task-methods as well.

          I've retested from scratch, although it's not a significant change. Unit-tests and the pig-integration test pass.

          (Please note that this patch applies on top of what Alan's kindly checked in already.)

          Show
          Mithun Radhakrishnan added a comment - Please pardon my oversight. I have an additional patch that sets output-dir in all task-methods as well. I've retested from scratch, although it's not a significant change. Unit-tests and the pig-integration test pass. (Please note that this patch applies on top of what Alan's kindly checked in already.)
          Mithun Radhakrishnan made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Hide
          Francis Liu added a comment -

          +1

          Show
          Francis Liu added a comment - +1
          Hide
          Alan Gates added a comment -

          I keep closing this jack in the box and it keeps popping back open. Latest patch (HCATALOG-276-this-is-the-longest-patch-name-ever.patch ) checked in. Let's hope this is the final one.

          Show
          Alan Gates added a comment - I keep closing this jack in the box and it keeps popping back open. Latest patch ( HCATALOG-276 -this-is-the-longest-patch-name-ever.patch ) checked in. Let's hope this is the final one.
          Alan Gates made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Mithun Radhakrishnan added a comment -

          Much obliged, Alan. (Sorry about the patch name. :])

          Show
          Mithun Radhakrishnan added a comment - Much obliged, Alan. (Sorry about the patch name. :])
          Hide
          Alan Gates added a comment -

          Issue closed with 0.4 release.

          Show
          Alan Gates added a comment - Issue closed with 0.4 release.
          Alan Gates made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          3h 2m 1 Alan Gates 26/Feb/12 03:33
          Resolved Resolved Reopened Reopened
          5d 15h 57m 2 Francis Liu 12/Mar/12 21:13
          Reopened Reopened Patch Available Patch Available
          7d 8h 14m 2 Mithun Radhakrishnan 12/Mar/12 21:50
          Patch Available Patch Available Resolved Resolved
          3d 11h 22m 2 Alan Gates 13/Mar/12 15:08
          Resolved Resolved Closed Closed
          64d 10h 12m 1 Alan Gates 17/May/12 02:20

            People

            • Assignee:
              Mithun Radhakrishnan
              Reporter:
              Alan Gates
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development