Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.2
-
None
-
None
-
metastore db - postgres (tried on 9.3 and 11.5)
Description
given the following situation:
there are 2 tables (in db "intpartitionbugtest"), each with a few rows:
- test_table_int_1 partitioned by y of type int
- test_table_string_1 partitioned by x of type string
here is the output of the following query on the metastore db:
select "PARTITIONS"."PART_ID", "TBLS"."TBL_NAME", "FILTER0"."PART_KEY_VAL", "PART_NAME" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
the problem
when running a hive query on the table test_table_int_1 that filters on y=1
sometimes the following exception will happen on the metastore
javax.jdo.JDODataStoreException: Error executing SQL query "select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where "DBS"."CTLG_NAME" = ? and (((case when "FILTER0"."PART_KEY_VAL" <> ? then cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = ?))". at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543) ~[datanucleus-api-jdo-4.2.4.jar:?] at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) ~[datanucleus-api-jdo-4.2.4.jar:?] at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) ~[datanucleus-api-jdo-4.2.4.jar:?] at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2003) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:593) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:481) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3853) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3843) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3577) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3861) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:3516) [hive-exec-3.1.2.jar:3.1.2] at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) [hive-exec-3.1.2.jar:3.1.2] at com.sun.proxy.$Proxy28.getPartitionsByFilter(Unknown Source) [?:?] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5883) [hive-exec-3.1.2.jar:3.1.2] at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) [hive-exec-3.1.2.jar:3.1.2] at com.sun.proxy.$Proxy30.get_partitions_by_filter(Unknown Source) [?:?] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16234) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16218) [hive-exec-3.1.2.jar:3.1.2] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.2.jar:3.1.2] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636) [hive-exec-3.1.2.jar:3.1.2] at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631) [hive-exec-3.1.2.jar:3.1.2] at java.security.AccessController.doPrivileged(Native Method) [?:1.8.0_112] at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_112] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) [hadoop-common-3.2.0.jar:?] at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631) [hive-exec-3.1.2.jar:3.1.2] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.2.jar:3.1.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] Caused by: org.postgresql.util.PSQLException: ERROR: invalid input syntax for type numeric: "c" at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433) ~[postgresql-42.2.2.jar:42.2.2] at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178) ~[postgresql-42.2.2.jar:42.2.2] at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306) ~[postgresql-42.2.2.jar:42.2.2] at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.2.jar:42.2.2] at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.2.jar:42.2.2] at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155) ~[postgresql-42.2.2.jar:42.2.2] at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118) ~[postgresql-42.2.2.jar:42.2.2] at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52) ~[HikariCP-2.6.1.jar:?] at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java) ~[HikariCP-2.6.1.jar:?] at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375) ~[datanucleus-rdbms-4.1.19.jar:?] at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552) ~[datanucleus-rdbms-4.1.19.jar:?] at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) ~[datanucleus-rdbms-4.1.19.jar:?] at org.datanucleus.store.query.Query.executeQuery(Query.java:1855) ~[datanucleus-core-4.1.17.jar:?] at org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807) ~[datanucleus-rdbms-4.1.19.jar:?] at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) ~[datanucleus-api-jdo-4.2.4.jar:?] ... 35 more
the query that is failing is generated by:
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal
here is the query with the parameters added:
select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = 'test_table_int_1' inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = 'intpartitionbugtest' inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where "DBS"."CTLG_NAME" = 'hive' and (( ( case when "FILTER0"."PART_KEY_VAL" <> '__HIVE_DEFAULT_PARTITION__' then cast("FILTER0"."PART_KEY_VAL" as decimal(21, 0)) else null end) = 1))
the query is failing on the casting to decimal of the string partition values (a,b,c)
Analysis of the problem
this issue doesn't always happen.
when I start with a clean db, the query runs successfully, but after running many automated tests that are doing inserts and queries from hive tables, the query breaks
I run EXPLAIN VERBOSE on the query when it works and when it breaks:
working plan:
looks like first there is an index filtering in "TBL_NAME" that limits the query only to the correct table,
and later when there is a filter on "PART_KEY_VAL", all the values are numeric
broken plan:
we can see that first there is filter on "PART_KEY_VAL", and it runs on all the tables, and fail on the casting of the string columns
my guess is that after some inserts and queries of the relevant tables, the statistics for those tables change, and postgres decides no to use the index on "TBL_NAME"
Workaround
I managed to workaround the issue by setting the following setting on the postgres:
enable_seqscan = false
from here: https://www.postgresql.org/docs/9.5/runtime-config-query.html
It is impossible to suppress sequential scans entirely, but turning this variable off discourages the planner from using one if there are other methods available.
so it will force the use of the index on "TBL_NAME",
but that is not a good solution for production, as it will hurt performance
I think the solution should be to perform a safe cast in the query