Hive
  1. Hive
  2. HIVE-7282

HCatLoader fail to load Orc map with null key

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: HCatalog
    • Labels:
      None

      Description

      Here is the stack:
      Get exception:
      AttemptID:attempt_1403634189382_0011_m_000000_0 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple
      at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
      at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
      at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
      at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
      at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
      Caused by: java.lang.NullPointerException
      at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToPigMap(PigHCatUtil.java:469)
      at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:404)
      at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:456)
      at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:374)
      at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
      ... 13 more

      1. HIVE-7282-1.patch
        2 kB
        Daniel Dai
      2. HIVE-7282-2.patch
        4 kB
        Daniel Dai
      3. HIVE-7282.3.patch
        5 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          Attach patch. Will add testcase later.

          Show
          Daniel Dai added a comment - Attach patch. Will add testcase later.
          Hide
          Daniel Dai added a comment -

          Add test case.

          Show
          Daniel Dai added a comment - Add test case.
          Hide
          Eugene Koifman added a comment -

          Would it not make more sense to add the new test to TestHCatLoaderComplexSchema, so that it's run with both ORC and RCFile?

          Show
          Eugene Koifman added a comment - Would it not make more sense to add the new test to TestHCatLoaderComplexSchema, so that it's run with both ORC and RCFile?
          Hide
          Eugene Koifman added a comment -

          Also, can HIVE-5020 now be closed as duplicate?

          Show
          Eugene Koifman added a comment - Also, can HIVE-5020 now be closed as duplicate?
          Hide
          Sushanth Sowmyan added a comment -

          While this protects the difference between orc and rcfile from HCat, HIVE-5020 is about the differences in behaviour between rcfile and orc in how they handle nulls in maps, and should not be closed until hive has a consistent behaviour. I would actually prefer to solve this in a consistent manner in hive before applying this to hcat, as explained in comments in that jira. I'll try to revive the discussion there.

          Show
          Sushanth Sowmyan added a comment - While this protects the difference between orc and rcfile from HCat, HIVE-5020 is about the differences in behaviour between rcfile and orc in how they handle nulls in maps, and should not be closed until hive has a consistent behaviour. I would actually prefer to solve this in a consistent manner in hive before applying this to hcat, as explained in comments in that jira. I'll try to revive the discussion there.
          Hide
          Daniel Dai added a comment -

          When I digging more, I feel disallow null map key is more proper. Reasons are:
          1. This can solve the semantic difference between orc and rcfile
          2. Allow null map key seems risky, it will break assumption of some other code, eg, LazyMap

          Show
          Daniel Dai added a comment - When I digging more, I feel disallow null map key is more proper. Reasons are: 1. This can solve the semantic difference between orc and rcfile 2. Allow null map key seems risky, it will break assumption of some other code, eg, LazyMap
          Hide
          Eugene Koifman added a comment -

          I agree that null key in a map is a bad idea. Since we still have to deal with data which already has been written with null key, could we add some table property that will let user say "if data contains a map with null key, replace null with 'my_value' on read". (Perhaps the same property can be used to change a null key to 'my_value' on write to support existing writers, but this of course won't work for all cases.) This way null key can be disallowed.

          Show
          Eugene Koifman added a comment - I agree that null key in a map is a bad idea. Since we still have to deal with data which already has been written with null key, could we add some table property that will let user say "if data contains a map with null key, replace null with 'my_value' on read". (Perhaps the same property can be used to change a null key to 'my_value' on write to support existing writers, but this of course won't work for all cases.) This way null key can be disallowed.
          Hide
          Daniel Dai added a comment -

          Resync the patch with trunk. Also in the new patch, I skip the null key entries in the map, this is consistent with null key handling in other part of Hive (HIVE-8115).

          Show
          Daniel Dai added a comment - Resync the patch with trunk. Also in the new patch, I skip the null key entries in the map, this is consistent with null key handling in other part of Hive ( HIVE-8115 ).
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12675550/HIVE-7282.3.patch

          ERROR: -1 due to 2 failed/errored test(s), 6569 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_correctness
          org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1321/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1321/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1321/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12675550

          • PreCommit-HIVE-TRUNK-Build
          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12675550/HIVE-7282.3.patch ERROR: -1 due to 2 failed/errored test(s), 6569 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_correctness org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1321/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1321/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1321/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12675550 PreCommit-HIVE-TRUNK-Build
          Hide
          Sushanth Sowmyan added a comment -

          +1, looks good to me.

          Show
          Sushanth Sowmyan added a comment - +1, looks good to me.
          Hide
          Daniel Dai added a comment -

          Patch committed to both trunk and 0.14 branch.

          Show
          Daniel Dai added a comment - Patch committed to both trunk and 0.14 branch.
          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development