Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-12091

Merge file doesn't work for ORC table when running on Spark. [Spark Branch]

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1
    • spark-branch, 1.3.0, 2.0.0
    • Spark
    • None

    Description

      This issue occurs when hive.merge.sparkfiles is set to true. And can be workaround by setting hive.merge.sparkfiles to false.
      BTW, we did a local experiment to run the case with MR engine (set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true, it can pass.

      (1)Component Version:
      – Hive Spark Branch 70eeadd2f019dcb2e301690290c8807731eab7a1 + Hive-11473 patch (HIVE-11473.3-spark.patch) ---> This is to support Spark 1.5 for Hive on Spark
      – Spark 1.5.1

      (2)Case used:
      – Big-Bench Data Load (load data from HDFS to Hive warehouse, scored as ORC format). The related HiveQL:

      DROP TABLE IF EXISTS customer_temporary;
      CREATE EXTERNAL TABLE customer_temporary
        ( c_customer_sk             bigint              --not null
        , c_customer_id             string              --not null
        , c_current_cdemo_sk        bigint
        , c_current_hdemo_sk        bigint
        , c_current_addr_sk         bigint
        , c_first_shipto_date_sk    bigint
        , c_first_sales_date_sk     bigint
        , c_salutation              string
        , c_first_name              string
        , c_last_name               string
        , c_preferred_cust_flag     string
        , c_birth_day               int
        , c_birth_month             int
        , c_birth_year              int
        , c_birth_country           string
        , c_login                   string
        , c_email_address           string
        , c_last_review_date        string
        )
        ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
        STORED AS TEXTFILE LOCATION '/user/root/benchmarks/bigbench_n1t/data/customer'
      ;
      
      DROP TABLE IF EXISTS customer;
      CREATE TABLE customer
      STORED AS ORC
      AS
      SELECT * FROM customer_temporary
      ;
      

      (3)Error/Exception Message:

      15/10/12 14:28:38 INFO exec.Utilities: PLAN PATH = hdfs://bhx2:8020/tmp/hive/root/4e145415-d4ea-4751-9e16-ff31edb0c258/hive_2015-10-12_14-28-12_485_2093357701513622173-1/-mr-10005/d891fdec-eacc-4f66-8827-e2b650c24810/map.xml
      15/10/12 14:28:38 INFO OrcFileMergeOperator: ORC merge file input path: hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/000001_0
      15/10/12 14:28:38 INFO OrcFileMergeOperator: Merged stripe from file hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/000001_0 [ offset : 3 length: 10525754 row: 247500 ]
      15/10/12 14:28:38 INFO spark.SparkMergeFileRecordHandler: Closing Merge Operator OFM
      15/10/12 14:28:38 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 (TID 4)
      java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator
      	at org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:115)
      	at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
      	at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
      	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
      	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
      	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
      	at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
      	at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
      	at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
      	at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      	at org.apache.spark.scheduler.Task.run(Task.scala:88)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator
      	at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:235)
      	at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:236)
      	at org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:113)
      	... 15 more
      Caused by: java.io.IOException: Unable to rename hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/_task_tmp.-ext-10001/_tmp.000001_0 to hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/_tmp.-ext-10001/000001_0
      	at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:208)
      	... 17 more
      

      Attachments

        1. HIVE-12091.2-spark.patch
          195 kB
          Rui Li
        2. HIVE-12091.1-spark.patch
          2 kB
          Rui Li

        Issue Links

          Activity

            People

              lirui Rui Li
              xhao1 Xin Hao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: