Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24469

StatsTask failure while inserting the data into the table partitioned by timestamp

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0.0
    • Fix Version/s: None
    • Component/s: Hive
    • Labels:
      None

      Description

      Steps to repro:

      CREATE EXTERNAL TABLE `tblsource`(
        `x` int, 
        `y` string)
      STORED AS PARQUET;
      CREATE EXTERNAL TABLE `tblinsert`(
        `x` int)
      PARTITIONED BY ( 
        `y` timestamp)
      STORED AS PARQUET;
      insert into table tblsource values (5,'2020-11-06 00:00:00.000');
      insert into table tblinsert partition(y) select * from tblsource distribute by (y);
      

      Query fail while executing the stats task and I can see the exception in HMS

      java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
              at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
              at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
              at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:8629) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:8590) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_232]
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_232]
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_232]
              at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232]
              at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at com.sun.proxy.$Proxy28.set_aggr_stats_for(Unknown Source) ~[?:?]
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:18937) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:18921) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_232]
              at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_232]
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) ~[hadoop-common-3.1.1.7.2.0.0-237.jar:?]
              at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
      

      I think the problem is with timestamp containing all 000 in nano seconds, after inserting the value 2020-11-06 00:00:00.000, hive perform set_aggr_stats_for and construct the SetPartitionsStatsRequest. during construction of the request since nano seconds are all 0 hive FetchOperator convert the 2020-11-06 00:00:00.000 to 2020-11-06 00:00:00 ( Timestamp.valueOf(string)).

      https://github.com/apache/hive/blob/f8aa55f9c8f22c4fd293d9531192f7f46099a420/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L176

      on HMS

      https://github.com/apache/hive/blob/2ab194d25311e15487ae010b8dd113879ccd501b/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L8626

      does not yield any partition as the filter expression for partition was 2020-11-06 00:00:00 hence it fail with the above mentioned IndexOutOfBoundsException.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Rajkumar Singh Rajkumar Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: