Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8295

Add batch retrieve partition objects for metastore direct sql

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently in MetastoreDirectSql partition objects are constructed in a way that fetching partition ids first. However, if the partition ids that match the filter is larger than 1000, direct sql will fail with the following stack trace:

      2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms + 13.048901ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS"  inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"     and "TBLS"."TBL_NAME" = ?   inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"      and "DBS"."NAME" = ? inner join "PARTITION_KEY_VALS" "FILTER2" on "FILTER2"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2 where (("FILTER2"."PART_KEY_VAL" = ?))]
      2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back to ORM
      javax.jdo.JDODataStoreException: Error executing SQL query "select "PARTITIONS"."PART_ID", "SDS"."SD_ID", "SDS"."CD_ID", "SERDES"."SERDE_ID", "PARTITIONS"."CREATE_TIME", "PARTITIONS"."LAST_ACCESS_TIME", "SDS"."INPUT_FORMAT", "SDS"."IS_COMPRESSED", "SDS"."IS_STOREDASSUBDIRECTORIES", "SDS"."LOCATION", "SDS"."NUM_BUCKETS", "SDS"."OUTPUT_FORMAT", "SERDES"."NAME", "SERDES"."SLIB" from "PARTITIONS"  left outer join "SDS" on "PARTITIONS"."SD_ID" = "SDS"."SD_ID"   left outer join "SERDES" on "SDS"."SERDE_ID" = "SERDES"."SERDE_ID" where "PART_ID" in (136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
      ) order by "PART_NAME" asc".
          at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
          at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
          at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
          at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
          at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
          at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
          at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
          at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
          at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
          at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
          at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
          at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
          at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
          at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
          at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
          at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
          at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:415)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
          at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
          at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
          at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
          at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:745)
      NestedThrowablesStackTrace:
      java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in a list is 1000
      

      Add retrieve partition objects in batch for direct sql will solve this Oracle specific problem. And it also bring some performance benefit and will reduce the memory footprint.

      1. HIVE-8295.1.patch
        7 kB
        Selina Zhang
      2. HIVE-8295.02.patch
        22 kB
        Sergey Shelukhin
      3. HIVE-8295.02.patch
        22 kB
        Sergey Shelukhin
      4. HIVE-8295.03.patch
        28 kB
        Sergey Shelukhin
      5. HIVE-8295.04.patch
        28 kB
        Sergey Shelukhin

        Issue Links

          Activity

          Hide
          selinazh Selina Zhang added a comment -

          In this patch, I did not add configuration parameter to control the batch size, for 1000 is hard limit and not configurable.

          Show
          selinazh Selina Zhang added a comment - In this patch, I did not add configuration parameter to control the batch size, for 1000 is hard limit and not configurable.
          Hide
          hiveqa Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12671876/HIVE-8295.1.patch

          ERROR: -1 due to 2 failed/errored test(s), 6370 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
          org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1048/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1048/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1048/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12671876

          Show
          hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671876/HIVE-8295.1.patch ERROR: -1 due to 2 failed/errored test(s), 6370 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1048/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1048/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1048/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12671876
          Hide
          sershe Sergey Shelukhin added a comment -

          Nit: is it possible to make batch size static final?
          Also, given that this is not necessary for normal databases, is it possible to either make it configurable, off by default, or detect Oracle if that is easy? Like the way it does quote handling in MySQL

          Show
          sershe Sergey Shelukhin added a comment - Nit: is it possible to make batch size static final? Also, given that this is not necessary for normal databases, is it possible to either make it configurable, off by default, or detect Oracle if that is easy? Like the way it does quote handling in MySQL
          Hide
          sershe Sergey Shelukhin added a comment -

          I'll make some changes to the patch, coming today afternoon

          Show
          sershe Sergey Shelukhin added a comment - I'll make some changes to the patch, coming today afternoon
          Hide
          sershe Sergey Shelukhin added a comment -

          Add Oracle detection, remove batching from retrieval by name (since first of all list of names is already a batch, and 2nd max was always passed as null there from callers), clean up method signature, some other cleanup not related to the patch

          Show
          sershe Sergey Shelukhin added a comment - Add Oracle detection, remove batching from retrieval by name (since first of all list of names is already a batch, and 2nd max was always passed as null there from callers), clean up method signature, some other cleanup not related to the patch
          Hide
          sershe Sergey Shelukhin added a comment -

          Ashutosh Chauhan can you please review? https://reviews.apache.org/r/27847/
          I haven't tested Oracle detection on Oracle yet, need to get access to a machine with one

          Show
          sershe Sergey Shelukhin added a comment - Ashutosh Chauhan can you please review? https://reviews.apache.org/r/27847/ I haven't tested Oracle detection on Oracle yet, need to get access to a machine with one
          Hide
          ashutoshc Ashutosh Chauhan added a comment -

          Looks good overall. Few comments on RB.

          Show
          ashutoshc Ashutosh Chauhan added a comment - Looks good overall. Few comments on RB.
          Hide
          sershe Sergey Shelukhin added a comment -

          CR feedback. After discussion, add config setting and also check for MSSQL - it has perf problems in such cases.

          Show
          sershe Sergey Shelukhin added a comment - CR feedback. After discussion, add config setting and also check for MSSQL - it has perf problems in such cases.
          Hide
          dschorow David Schorow added a comment -

          Can we make sure the fix for this Jira also fixes HIVE-7999? That's a problem with SQL Server, where the query is just to long to parse.

          For a select * on a Hive Table, a query like this:

          SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 10000000 ...

          is generated and executed on metastore, however, due the query lists all the partitions in it, SQL server is unable to compile/execute such a long query and causing the whole query to fail.

          Show
          dschorow David Schorow added a comment - Can we make sure the fix for this Jira also fixes HIVE-7999 ? That's a problem with SQL Server, where the query is just to long to parse. For a select * on a Hive Table, a query like this: SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 10000000 ... is generated and executed on metastore, however, due the query lists all the partitions in it, SQL server is unable to compile/execute such a long query and causing the whole query to fail.
          Hide
          sershe Sergey Shelukhin added a comment -

          The recent iteration of the patch also adds batching in SQL server case. Whether it will fix SQL server is another question; if sending many queries (batching by 1000 by default) will help, then it will. But if individual queries together are as slow as the big query, different approach would be needed (batching on the client, which will be much slower overall but won't time out).

          Show
          sershe Sergey Shelukhin added a comment - The recent iteration of the patch also adds batching in SQL server case. Whether it will fix SQL server is another question; if sending many queries (batching by 1000 by default) will help, then it will. But if individual queries together are as slow as the big query, different approach would be needed (batching on the client, which will be much slower overall but won't time out).
          Hide
          dschorow David Schorow added a comment -

          On SQL Server, the issue is not about being slow, it is about not being able to parse and compile the query, since the query text is to long. I strongly suspect that splitting the query into batches of 1000 will fix that problem as well. I just wanted to make sure we were not limiting this change to just Oracle, but are having it be applicable to other DBs as well.

          Show
          dschorow David Schorow added a comment - On SQL Server, the issue is not about being slow, it is about not being able to parse and compile the query, since the query text is to long. I strongly suspect that splitting the query into batches of 1000 will fix that problem as well. I just wanted to make sure we were not limiting this change to just Oracle, but are having it be applicable to other DBs as well.
          Hide
          ashutoshc Ashutosh Chauhan added a comment -

          I think patch needs system testing for both Oracle & MS-SQL. However, code changes seems like they will address the problem so +1 for patch

          Show
          ashutoshc Ashutosh Chauhan added a comment - I think patch needs system testing for both Oracle & MS-SQL. However, code changes seems like they will address the problem so +1 for patch
          Hide
          ekoifman Eugene Koifman added a comment -

          One thing that may be useful (for Oracle) is to make the query like " ... a IN (1,....1000) or a IN (1001, ... 2000)..." it should still avoid the 1000 limit and run only 1 query.

          Show
          ekoifman Eugene Koifman added a comment - One thing that may be useful (for Oracle) is to make the query like " ... a IN (1,....1000) or a IN (1001, ... 2000)..." it should still avoid the 1000 limit and run only 1 query.
          Hide
          sershe Sergey Shelukhin added a comment -

          There are comments where this fix is suggested online that it is slow.

          Show
          sershe Sergey Shelukhin added a comment - There are comments where this fix is suggested online that it is slow.
          Hide
          sershe Sergey Shelukhin added a comment -

          Tested, appears to work on Oracle now

          Show
          sershe Sergey Shelukhin added a comment - Tested, appears to work on Oracle now
          Hide
          sershe Sergey Shelukhin added a comment -

          ping?

          Show
          sershe Sergey Shelukhin added a comment - ping?
          Hide
          sershe Sergey Shelukhin added a comment -

          nm, didn't see the +1

          Show
          sershe Sergey Shelukhin added a comment - nm, didn't see the +1
          Hide
          sershe Sergey Shelukhin added a comment -

          same patch, need HiveQA for this patch

          Show
          sershe Sergey Shelukhin added a comment - same patch, need HiveQA for this patch
          Hide
          sershe Sergey Shelukhin added a comment -

          Gunther Hagleitner ok for 14 branch, for 14.1?

          Show
          sershe Sergey Shelukhin added a comment - Gunther Hagleitner ok for 14 branch, for 14.1?
          Hide
          hagleitn Gunther Hagleitner added a comment -

          yes +1 for that.

          Show
          hagleitn Gunther Hagleitner added a comment - yes +1 for that.
          Hide
          hiveqa Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12681142/HIVE-8295.04.patch

          ERROR: -1 due to 1 failed/errored test(s), 6687 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1765/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1765/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1765/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12681142 - PreCommit-HIVE-TRUNK-Build

          Show
          hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12681142/HIVE-8295.04.patch ERROR: -1 due to 1 failed/errored test(s), 6687 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1765/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1765/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1765/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12681142 - PreCommit-HIVE-TRUNK-Build
          Hide
          sershe Sergey Shelukhin added a comment -

          committed to both

          Show
          sershe Sergey Shelukhin added a comment - committed to both
          Hide
          thejas Thejas M Nair added a comment -

          Updating release version for jiras resolved in 1.0.0 .

          Show
          thejas Thejas M Nair added a comment - Updating release version for jiras resolved in 1.0.0 .
          Hide
          thejas Thejas M Nair added a comment -

          This issue has been fixed in Apache Hive 1.0.0. If there is any issue with the fix, please open a new jira to address it.

          Show
          thejas Thejas M Nair added a comment - This issue has been fixed in Apache Hive 1.0.0. If there is any issue with the fix, please open a new jira to address it.

            People

            • Assignee:
              olegd Oleg Danilov
              Reporter:
              selinazh Selina Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development