Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1610

Using CombinedHiveInputFormat causes partToPartitionInfo IOException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • Hadoop 0.20.2

    Description

      I have a relatively complicated hive query using CombinedHiveInputFormat:
      set hive.exec.dynamic.partition.mode=nonstrict;
      set hive.exec.dynamic.partition=true;
      set hive.exec.max.dynamic.partitions=1000;
      set hive.exec.max.dynamic.partitions.pernode=300;
      set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
      INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, keywords.universal_rank, keywords.serp_type, keywords.date_indexed, keywords.search_engine_type, keywords.week from keyword_serp_results keywords JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, keywords1.search_engine_type, keywords1.week, keywords1.rank, dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN (select domain, keyword, search_engine_type, week, max(date_indexed) as max_date_indexed from keyword_serp_results group by domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND keywords1.search_engine_type = dupkeywords1.search_engine_type AND keywords1.week = dupkeywords1.week AND keywords1.date_indexed = dupkeywords1.max_date_indexed) dupkeywords2 group by domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on keywords.keyword = dupkeywords3.keyword AND keywords.domain = dupkeywords3.domain AND keywords.search_engine_type = dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = dupkeywords3.best_rank;

      This query use to work fine until I updated to r991183 on trunk and started getting this error:

      java.io.IOException: cannot find dir = hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/000000_0 in
      partToPartitionInfo: [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002,
      hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829,
      hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829,
      hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829,
      hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829,
      hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831]
      at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
      at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:100)
      at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
      at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
      at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
      at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
      at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
      at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)

      This query works if I don't change the hive.input.format.
      set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

      I've narrowed down this issue to the commit for HIVE-1510. If I take out the changeset from r987746, everything works as before.

      Attachments

        1. 0004-hive.patch
          2 kB
          Sammy Yu
        2. 0003-HIVE-1610.patch
          3 kB
          Sammy Yu
        3. 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch
          3 kB
          Sammy Yu

        Activity

          People

            Unassigned Unassigned
            sammy.yu Sammy Yu
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: