Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16666

Set hive.exec.stagingdir a relative directory or a sub directory of distination data directory will cause Hive to delete the intermediate query results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Critical
    • Resolution: Unresolved
    • 3.0.0
    • None
    • Query Processor
    • None

    Description

      Set hive.exec.stagingdir=./*, for example set hive.exec.stagingdir=./opq8.
      Then excute a query like this:
      insert overwrite table test2 select * from test3;
      You will get the error like this:
      hive> set hive.exec.stagingdir=./opq8;
      hive> insert overwrite table test2 select * from test3;
      Query ID = mr_20170515134831_28ee392d-0d5a-4e47-b80c-dfcd31691b02
      Total jobs = 3
      Launching Job 1 out of 3
      Number of reduce tasks is set to 0 since there's no reduce operator
      Starting Job = job_1494818119523_0008, Tracking URL = http://zdh77:8088/proxy/application_1494818119523_0008/
      Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job -kill job_1494818119523_0008
      Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
      2017-05-15 13:48:51,487 Stage-1 map = 0%, reduce = 0%
      Ended Job = job_1494818119523_0008
      Stage-3 is selected by condition resolver.
      Stage-2 is filtered out by condition resolver.
      Stage-4 is filtered out by condition resolver.
      Moving data to directory hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
      Loading data to table default.test2
      Moved: 'hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1' to trash at: hdfs://nameservice/user/mr/.Trash/Current
      Failed with exception Unable to move source hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000 to destination hdfs://nameservice/hive/test2
      FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000 to destination hdfs://nameservice/hive/test2
      MapReduce Jobs Launched:
      Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
      Total MapReduce CPU Time Spent: 0 msec
      hive>

      hive.exec.stagingdir=./opq8 is a relative path for destination write directory /hive/test2. Hive will create a temporary directory /hive/test2/opq8_hive* for intermediate query results. Later in the move staging, Hive will delete or trash the sub directory under the /hive/test2 who's name does not begin with "_" or "." in order to move data to this directory. You can see its processing logic in org.apache.hadoop.hive.ql.metadata.trashFilesUnderDir.

      My modification method is: if stagingdir is a sub directory of the destination write directory. I add a "." in front of stagingdir. now temporary directory will be /hive/test2/.opq8_hive* , because the sub directory .opq8_hive* starts with ".", Hive will not delete it.
      hive> set hive.exec.stagingdir=./opq8;
      hive> insert overwrite table test2 select * from test3;
      Query ID = mr_20170515143940_ae48a65e-42be-4f50-b974-b713ca902867
      Total jobs = 3
      Launching Job 1 out of 3
      Number of reduce tasks is set to 0 since there's no reduce operator
      Starting Job = job_1494818119523_0012, Tracking URL = http://zdh77:8088/proxy/application_1494818119523_0012/
      Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job -kill job_1494818119523_0012
      Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
      2017-05-15 14:40:04,547 Stage-1 map = 0%, reduce = 0%
      Ended Job = job_1494818119523_0012
      Stage-3 is selected by condition resolver.
      Stage-2 is filtered out by condition resolver.
      Stage-4 is filtered out by condition resolver.
      Moving data to directory hdfs://nameservice/hive/test2/.opqt8_hive_2017-05-15_14-39-40_751_1221840798987515724-1/-ext-10000
      Loading data to table default.test2
      MapReduce Jobs Launched:
      Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
      Total MapReduce CPU Time Spent: 0 msec
      OK
      Time taken: 26.751 seconds
      hive>

      Attachments

        1. HIVE-16666.1.patch
          1 kB
          yangfang

        Activity

          People

            yangfang yangfang
            yangfang yangfang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: