Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23576

Getting partition of type int from metastore sometimes fail on cast error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.2
    • Fix Version/s: None
    • Component/s: Hive, Standalone Metastore
    • Labels:
      None
    • Environment:

      metastore db - postgres (tried on 9.3 and 11.5)

      Description

      given the following situation:

      there are 2 tables (in db "intpartitionbugtest"), each with a few rows:

      1. test_table_int_1 partitioned by y of type int
      2. test_table_string_1 partitioned by x of type string

      here is the output of the following query on the metastore db:

      select
      	"PARTITIONS"."PART_ID",
      	"TBLS"."TBL_NAME",
      	"FILTER0"."PART_KEY_VAL",
      	"PART_NAME"
      from
      	"PARTITIONS"
      inner join "TBLS" on
      	"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
      inner join "DBS" on
      	"TBLS"."DB_ID" = "DBS"."DB_ID"
      inner join "PARTITION_KEY_VALS" "FILTER0" on
      	"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
      

       

      the problem

      when running a hive query on the table test_table_int_1 that filters on y=1
      sometimes the following exception will happen on the metastore

       

      javax.jdo.JDODataStoreException: Error executing SQL query "select "PARTITIONS"."PART_ID" from "PARTITIONS"  inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"     and "TBLS"."TBL_NAME" = ?   inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"      and "DBS"."NAME" = ? inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where "DBS"."CTLG_NAME" = ?  and (((case when "FILTER0"."PART_KEY_VAL" <> ? then cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = ?))".
       	at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543) ~[datanucleus-api-jdo-4.2.4.jar:?]
       	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) ~[datanucleus-api-jdo-4.2.4.jar:?]
       	at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) ~[datanucleus-api-jdo-4.2.4.jar:?]
       	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2003) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:593) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:481) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3853) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3843) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3577) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3861) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:3516) [hive-exec-3.1.2.jar:3.1.2]
       	at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) ~[?:?]
       	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
       	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
       	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) [hive-exec-3.1.2.jar:3.1.2]
       	at com.sun.proxy.$Proxy28.getPartitionsByFilter(Unknown Source) [?:?]
       	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5883) [hive-exec-3.1.2.jar:3.1.2]
       	at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) ~[?:?]
       	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
       	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
       	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) [hive-exec-3.1.2.jar:3.1.2]
       	at com.sun.proxy.$Proxy30.get_partitions_by_filter(Unknown Source) [?:?]
       	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16234) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16218) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631) [hive-exec-3.1.2.jar:3.1.2]
       	at java.security.AccessController.doPrivileged(Native Method) [?:1.8.0_112]
       	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_112]
       	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) [hadoop-common-3.2.0.jar:?]
       	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631) [hive-exec-3.1.2.jar:3.1.2]
       	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.2.jar:3.1.2]
       	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
       	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
       	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
      Caused by: org.postgresql.util.PSQLException: ERROR: invalid input syntax for type numeric: "c"
       	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433) ~[postgresql-42.2.2.jar:42.2.2]
       	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178) ~[postgresql-42.2.2.jar:42.2.2]
       	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306) ~[postgresql-42.2.2.jar:42.2.2]
       	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.2.jar:42.2.2]
       	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.2.jar:42.2.2]
       	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155) ~[postgresql-42.2.2.jar:42.2.2]
       	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118) ~[postgresql-42.2.2.jar:42.2.2]
       	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52) ~[HikariCP-2.6.1.jar:?]
       	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java) ~[HikariCP-2.6.1.jar:?]
       	at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375) ~[datanucleus-rdbms-4.1.19.jar:?]
       	at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552) ~[datanucleus-rdbms-4.1.19.jar:?]
       	at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) ~[datanucleus-rdbms-4.1.19.jar:?]
       	at org.datanucleus.store.query.Query.executeQuery(Query.java:1855) ~[datanucleus-core-4.1.17.jar:?]
       	at org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807) ~[datanucleus-rdbms-4.1.19.jar:?]
       	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) ~[datanucleus-api-jdo-4.2.4.jar:?]
       	... 35 more
      

      the query that is failing is generated by: 

      org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal

      here is the query with the parameters added:

      select
      	"PARTITIONS"."PART_ID"
      from
      	"PARTITIONS"
      inner join "TBLS" on
      	"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
      	and "TBLS"."TBL_NAME" = 'test_table_int_1'
      inner join "DBS" on
      	"TBLS"."DB_ID" = "DBS"."DB_ID"
      	and "DBS"."NAME" = 'intpartitionbugtest'
      inner join "PARTITION_KEY_VALS" "FILTER0" on
      	"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
      	and "FILTER0"."INTEGER_IDX" = 0
      where
      "DBS"."CTLG_NAME" = 'hive'
      	and
      	((
      	(
      		case
      		when "FILTER0"."PART_KEY_VAL" <> '__HIVE_DEFAULT_PARTITION__' then cast("FILTER0"."PART_KEY_VAL" as decimal(21, 0))
      		else null end) = 1))
      

       the query is failing on the casting to decimal of the string partition values (a,b,c)

      Analysis of the problem

      this issue doesn't always happen.
      when I start with a clean db, the query runs successfully, but after running many automated tests that are doing inserts and queries from hive tables, the query breaks

      I run EXPLAIN VERBOSE on the query when it works and when it breaks:

      working plan:

      looks like first there is an index filtering in "TBL_NAME" that limits the query only to the correct table,
      and later when there is a filter on "PART_KEY_VAL", all the values are numeric

      broken plan:

      we can see that first there is filter on "PART_KEY_VAL", and it runs on all the tables, and fail on the casting of the string columns

      my guess is that after some inserts and queries of the relevant tables, the statistics for those tables change, and postgres decides no to use the index on "TBL_NAME"

      Workaround

      I managed to workaround the issue by setting the following setting on the postgres:

      enable_seqscan = false

      from here: https://www.postgresql.org/docs/9.5/runtime-config-query.html

      It is impossible to suppress sequential scans entirely, but turning this variable off discourages the planner from using one if there are other methods available. 

      so it will force the use of the index on "TBL_NAME",
      but that is not a good solution for production, as it will hurt performance

       

      I think the solution should be to perform a safe cast in the query

        Attachments

        1. broken plan_2.png
          110 kB
          Lev Katzav
        2. image-2020-05-29-14-16-29-356.png
          16 kB
          Lev Katzav
        3. working plan_2.png
          112 kB
          Lev Katzav

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              lev Lev Katzav
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: