Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24163

Dynamic Partitioning Insert for MM table fails during MoveTask

    XMLWordPrintableJSON

Details

    Description

      – DDLs and Query

      create table `class` (name varchar(8), sex varchar(1), age double precision, height double precision, weight double precision);
      
      insert into table class values ('RAJ','MALE',28,12,12);
      CREATE TABLE `PART1` (`id` DOUBLE,`N` DOUBLE,`Name` VARCHAR(8),`Sex` VARCHAR(1)) PARTITIONED BY(Weight string, Age
      string, Height string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE;
      
      INSERT INTO TABLE `part1` PARTITION (`Weight`,`Age`,`Height`)  SELECT 0, 0, `Name`,`Sex`,`Weight`,`Age`,`Height` FROM `class`;
      

      it fail during the MoveTask execution:

      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: partition hdfs://hostname:8020/warehouse/tablespace/managed/hive/part1/.hive-staging_hive_2020-09-02_13-29-58_765_4475282758764123921-1/-ext-10000/tmpstats-0_FS_3 is not a directory!
              at org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2769) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2837) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.exec.MoveTask.handleDynParts(MoveTask.java:562) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:440) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
              at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) ~[hive-service-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
      
      

      The reason is Task write the fsstat during the FileSinkOperator closing, HS2 ran the MoveTask to move data into the destination partition directory, while getting the partition location hive check whether destination is directory or not and failing.

      – hive set the stat location during
      https://github.com/apache/hive/blob/d700ea54ec5da5364d92a9faaa58f89ea03181e0/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L8135

      which is relative to the hive-staging directory:

      https://github.com/apache/hive/blob/fecad5b0f72c535ed1c53f2cc62b0d6649b651ae/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L617

      Attachments

        Issue Links

          Activity

            People

              kuczoram Marta Kuczora
              Rajkumar Singh Rajkumar Singh
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m