Hive
  1. Hive
  2. HIVE-6052

metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0, 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      JDO pushdown for integers in metastore didn't work correctly for some legacy data in partition columns in Hive 0.12. In 0.13, hive.metastore.integral.jdo.pushdown setting is added to enable it, and it's disabled by default. Enabling it improves metastore perf for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (e.g. have leading zeroes, like 0012). If metastore direct SQL is enabled and works, this optimization is also irrelevant.
      Show
      JDO pushdown for integers in metastore didn't work correctly for some legacy data in partition columns in Hive 0.12. In 0.13, hive.metastore.integral.jdo.pushdown setting is added to enable it, and it's disabled by default. Enabling it improves metastore perf for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (e.g. have leading zeroes, like 0012). If metastore direct SQL is enabled and works, this optimization is also irrelevant.

      Description

      If integer partition columns have values stores in non-canonical form, for example with leading zeroes, the integer filter doesn't work. That is because JDO pushdown uses substrings to compare for equality, and SQL pushdown is intentionally crippled to do the same to produce same results.
      Probably, since both SQL pushdown and integers pushdown are just perf optimizations, we can remove it for JDO (or make configurable and disable by default), and uncripple SQL.

      1. HIVE-6052.01.patch
        68 kB
        Sergey Shelukhin
      2. HIVE-6052.02.patch
        55 kB
        Sergey Shelukhin
      3. HIVE-6052.patch
        9 kB
        Sergey Shelukhin

        Issue Links

          Activity

          Sergey Shelukhin created issue -
          Hide
          Sergey Shelukhin added a comment -

          Simple patch that adds config option.
          Ashutosh Chauhan can you review? This can produce unexpected results

          Show
          Sergey Shelukhin added a comment - Simple patch that adds config option. Ashutosh Chauhan can you review? This can produce unexpected results
          Sergey Shelukhin made changes -
          Field Original Value New Value
          Attachment HIVE-6052.patch [ 12619228 ]
          Sergey Shelukhin made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Sergey Shelukhin added a comment -
          Show
          Sergey Shelukhin added a comment - rb at https://reviews.apache.org/r/16339/
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12619228/HIVE-6052.patch

          ERROR: -1 due to 8 failed/errored test(s), 4791 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default
          org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testPartitionFilter
          org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testPartitionFilter
          org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartitionFilter
          org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyClient.testPartitionFilter
          org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer.testPartitionFilter
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/681/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/681/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 8 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12619228

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619228/HIVE-6052.patch ERROR: -1 due to 8 failed/errored test(s), 4791 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testPartitionFilter org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testPartitionFilter org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartitionFilter org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyClient.testPartitionFilter org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer.testPartitionFilter Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/681/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/681/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed This message is automatically generated. ATTACHMENT ID: 12619228
          Hide
          Sergey Shelukhin added a comment -

          updated patch

          Show
          Sergey Shelukhin added a comment - updated patch
          Sergey Shelukhin made changes -
          Attachment HIVE-6052.01.patch [ 12619453 ]
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12619453/HIVE-6052.01.patch

          ERROR: -1 due to 2 failed/errored test(s), 4792 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/695/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/695/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12619453

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619453/HIVE-6052.01.patch ERROR: -1 due to 2 failed/errored test(s), 4792 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/695/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/695/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12619453
          Hide
          Ashutosh Chauhan added a comment -

          Few comments on RB.

          Show
          Ashutosh Chauhan added a comment - Few comments on RB.
          Ashutosh Chauhan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Sergey Shelukhin added a comment -

          Hmm, I cannot repro these test failures. The diffs are all in stats and are of the nature that would not be expected from the changes to the query. I wonder if stat generation is somehow affected by other queries, or machine.
          Making out files with removed stat changes...

          Show
          Sergey Shelukhin added a comment - Hmm, I cannot repro these test failures. The diffs are all in stats and are of the nature that would not be expected from the changes to the query. I wonder if stat generation is somehow affected by other queries, or machine. Making out files with removed stat changes...
          Sergey Shelukhin made changes -
          Attachment HIVE-6052.02.patch [ 12619620 ]
          Sergey Shelukhin made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Ashutosh Chauhan added a comment -

          +1

          Show
          Ashutosh Chauhan added a comment - +1
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12619620/HIVE-6052.02.patch

          SUCCESS: +1 4795 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/704/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/704/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12619620

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619620/HIVE-6052.02.patch SUCCESS: +1 4795 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/704/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/704/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12619620
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Sergey!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Sergey!
          Ashutosh Chauhan made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.13.0 [ 12324986 ]
          Resolution Fixed [ 1 ]
          Hide
          Xuefu Zhang added a comment -

          I think this JIRA has documentation impact. Release note should probably be updated.

          Show
          Xuefu Zhang added a comment - I think this JIRA has documentation impact. Release note should probably be updated.
          Hide
          Ashutosh Chauhan added a comment -

          Agreed. Sergey Shelukhin Can you provide documentation for this change?

          Show
          Ashutosh Chauhan added a comment - Agreed. Sergey Shelukhin Can you provide documentation for this change?
          Hide
          Sergey Shelukhin added a comment -

          You mean documentation of the bug? From user perspective the feature itself is not visible.

          Show
          Sergey Shelukhin added a comment - You mean documentation of the bug? From user perspective the feature itself is not visible.
          Hide
          Ashutosh Chauhan added a comment -

          Bug itself is I think sufficiently documented in description. I think there is need to document new config variable. Its off by default. We should document in what cases its safe for user to turn on this optimization. That usually goes in hive-default.xml.template Lets follow-up with a jira for that.

          Show
          Ashutosh Chauhan added a comment - Bug itself is I think sufficiently documented in description. I think there is need to document new config variable. Its off by default. We should document in what cases its safe for user to turn on this optimization. That usually goes in hive-default.xml.template Lets follow-up with a jira for that.
          Hide
          Ashutosh Chauhan added a comment -

          Also add same info on config variable in release notes field of jira as Xuefu suggested.

          Show
          Ashutosh Chauhan added a comment - Also add same info on config variable in release notes field of jira as Xuefu suggested.
          Hide
          Lefty Leverenz added a comment -

          When you write it up, I'll also add hive.metastore.integral.jdo.pushdown to the wiki. (Just wanted to name the config variable here, for future searches.)

          Show
          Lefty Leverenz added a comment - When you write it up, I'll also add hive.metastore.integral.jdo.pushdown to the wiki. (Just wanted to name the config variable here, for future searches.)
          Sergey Shelukhin made changes -
          Release Note JDO pushdown for integers in metastore didn't work correctly in Hive 0.12. In 0.13, hive.metastore.integral.jdo.pushdown setting is added to enable it, and it's disabled by default. Enabling it improves metastore perf for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (e.g. have leading zeroes, like 0012). If metastore direct SQL is enabled and works, this optimization is also irrelevant.
          Sergey Shelukhin made changes -
          Release Note JDO pushdown for integers in metastore didn't work correctly in Hive 0.12. In 0.13, hive.metastore.integral.jdo.pushdown setting is added to enable it, and it's disabled by default. Enabling it improves metastore perf for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (e.g. have leading zeroes, like 0012). If metastore direct SQL is enabled and works, this optimization is also irrelevant. JDO pushdown for integers in metastore didn't work correctly for some legacy data in partition columns in Hive 0.12. In 0.13, hive.metastore.integral.jdo.pushdown setting is added to enable it, and it's disabled by default. Enabling it improves metastore perf for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (e.g. have leading zeroes, like 0012). If metastore direct SQL is enabled and works, this optimization is also irrelevant.
          Hide
          Sergey Shelukhin added a comment -

          Added release note, added HIVE-6070 with somewhat different text (due to context)

          Show
          Sergey Shelukhin added a comment - Added release note, added HIVE-6070 with somewhat different text (due to context)
          Lefty Leverenz made changes -
          Link This issue is related to HIVE-6070 [ HIVE-6070 ]
          Hide
          Lefty Leverenz added a comment -

          hive.metastore.integral.jdo.pushdown is now documented in the wiki:

          Show
          Lefty Leverenz added a comment - hive.metastore.integral.jdo.pushdown is now documented in the wiki: hive.metastore.integral.jdo.pushdown
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          1d 9h 40m 1 Ashutosh Chauhan 19/Dec/13 11:44
          Open Open Patch Available Patch Available
          7h 44m 2 Sergey Shelukhin 19/Dec/13 18:53
          Patch Available Patch Available Resolved Resolved
          3h 55m 1 Ashutosh Chauhan 19/Dec/13 22:49

            People

            • Assignee:
              Sergey Shelukhin
              Reporter:
              Sergey Shelukhin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development