Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15178

ORC stripe merge may produce many MR jobs and no merge if split size is small

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.2.0
    • None
    • None

    Description

      orc_createas1
      logs the following:

      2016-11-10T13:38:54,366  INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: Paths:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0:2400+100InputFormatClass: org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat
      2016-11-10T13:38:54,373  INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: Paths:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0:2500+100InputFormatClass: org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat
      2016-11-10T13:38:54,380  INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: Paths:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0:2600+100InputFormatClass: org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat
      2016-11-10T13:38:54,387  INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: Paths:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0:2700+100InputFormatClass: org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat
      ...
      

      It tries to merge 2 files, but instead ends up running tons of MR tasks for every 100 bytes and produces 2 files again (I assume most tasks don't produce the files because the split at a random 100-byte offset is invalid).

      2016-11-10T13:38:53,985  INFO [LocalJobRunner Map Task Executor #0] OrcFileMergeOperator: Merged stripe from file pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000000_0 [ offset : 3 length: 2770 row: 500 ]
      2016-11-10T13:38:53,995  INFO [LocalJobRunner Map Task Executor #0] exec.AbstractFileMergeOperator: renamed path pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/_task_tmp.-ext-10002/_tmp.000002_0 to pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/_tmp.-ext-10002/000002_0 . File size is 2986
      2016-11-10T13:38:54,206  INFO [LocalJobRunner Map Task Executor #0] OrcFileMergeOperator: Merged stripe from file pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0 [ offset : 3 length: 2770 row: 500 ]
      2016-11-10T13:38:54,215  INFO [LocalJobRunner Map Task Executor #0] exec.AbstractFileMergeOperator: renamed path pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/_task_tmp.-ext-10002/_tmp.000030_0 to pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/_tmp.-ext-10002/000030_0 . File size is 2986
      

      This is because the test sets the max split size to 100. Merge jobs is supposed to override that, but that doesn't happen somehow.

      Attachments

        1. HIVE-15178.patch
          3 kB
          Sergey Shelukhin
        2. HIVE-15178.01.patch
          3 kB
          Sergey Shelukhin
        3. HIVE-15178.03.patch
          3 kB
          Sergey Shelukhin
        4. HIVE-15178.03.patch
          3 kB
          Sergey Shelukhin

        Issue Links

          Activity

            People

              sershe Sergey Shelukhin
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: