Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3587

Lost data during INSERT query

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 0.9.0
    • None
    • Query Processor
    • None
    • Ubuntu 10.04
      Hadoop MapReduce 0.20.2
      Cloudera 4.1.0
      3 data/task nodes

    Description

      I'm trying to load a table using an INSERT query (1). Not all the data is making it from the original table into the new table. The query generates 2 jobs. The first job takes about 45 minutes with mapred.mapper.class = org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileMergeMapper and the second takes ~10 seconds with mapred.mapper.class = org.apache.hadoop.hive.ql.exec.ExecMapper. Toward the end (< 2 minutes) of the first job a number of IOExceptions are raised (2). The exceptions are only raised in the last mapper task to complete, the other mapper tasks complete successfully. The exceptions indicate that an expected temporary file is missing. The second jobs completes entirely successfully. According to the task tracker web interface the jobs are run sequentially with no overlap. However, the second job spawns a number of tasks which rename the very temporary files that are the cause of the failures in the first job (3).

      (1) https://cwiki.apache.org/Hive/languagemanual-dml.html#LanguageManualDML-InsertingdataintoHiveTablesfromqueries

      (2) Example: ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
      org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/task_tmp.-ext-10002/month=2012-01/_tmp.000000_1 File does not exist. Holder DFSClient_NONMAPREDUCE-672101740_1 does not have any open files.

      (3) Example: 2012-10-16 15:36:57,605 INFO RCFileMergeMapper: renamed path hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_task_tmp.-ext-10000/month=2012-01/_tmp.000011_0 to hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_tmp.-ext-10000/month=2012-01/000011_0 . File size is 3482

      Attachments

        Activity

          People

            Unassigned Unassigned
            jimmyk-rdio Jim Krehl
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: