Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2869

Merging small files throws RuntimeException when hive.mergejob.maponly=false

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.0
    • None
    • Query Processor
    • None
    • CentOS release 5.5 (Final)

    Description

      Hive Version: Hive 0.8 (last commit SHA b581a6192b8d4c544092679d05f45b2e50d42b45 )
      Hadoop version : chd3u0

      Trying to use the hive merge small file feature by setting all the necessary params.
      Have disabled use of CombineHiveInputFormat since my input is compressed text.

      hive> set mapred.min.split.size.per.node=1000000000;
      hive> set mapred.min.split.size.per.rack=1000000000;
      hive> set mapred.max.split.size=1000000000;
      hive> set hive.merge.size.per.task=1000000000;
      hive> set hive.merge.smallfiles.avgsize=1000000000;
      hive> set hive.merge.size.smallfiles.avgsize=1000000000;
      hive> set hive.merge.mapfiles=true;
      hive> set hive.merge.mapredfiles=true;
      hive> set hive.mergejob.maponly=false;
      

      The plan decides to launch two MR jobs but after first job succeeds I get runt time error
      "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified"

      How to reproduce :

      • Creare tables as follows :
        --create input table
        create table tmp_notmerged (
          id                int,
          name              string
        )
        ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
        STORED AS TEXTFILE;
        
        
        --create o/p table
        create table tmp_merged (
          id                int
        )
        STORED AS TEXTFILE;
        
      • Load data into tmp_notmerged (find files attached in with this jira)
      • set knobs and fire hive query
        set hive.merge.mapfiles=true;
        set hive.mergejob.maponly=false;
        insert overwrite table tmp_merged select id from tmp_notmerged;
        
      • You should see error "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified"

      Proposed fix :

      Patch is here : https://gist.github.com/2025303

      Attachments

        1. data_to_reproduce.tar.gz
          3 kB
          Shrijeet Paliwal

        Activity

          People

            Unassigned Unassigned
            shrijeet Shrijeet Paliwal
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: