Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
orc_createas1
logs the following:
2016-11-10T13:38:54,366 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: Paths:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0:2400+100InputFormatClass: org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat 2016-11-10T13:38:54,373 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: Paths:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0:2500+100InputFormatClass: org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat 2016-11-10T13:38:54,380 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: Paths:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0:2600+100InputFormatClass: org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat 2016-11-10T13:38:54,387 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: Paths:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0:2700+100InputFormatClass: org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat ...
It tries to merge 2 files, but instead ends up running tons of MR tasks for every 100 bytes and produces 2 files again (I assume most tasks don't produce the files because the split at a random 100-byte offset is invalid).
2016-11-10T13:38:53,985 INFO [LocalJobRunner Map Task Executor #0] OrcFileMergeOperator: Merged stripe from file pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000000_0 [ offset : 3 length: 2770 row: 500 ] 2016-11-10T13:38:53,995 INFO [LocalJobRunner Map Task Executor #0] exec.AbstractFileMergeOperator: renamed path pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/_task_tmp.-ext-10002/_tmp.000002_0 to pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/_tmp.-ext-10002/000002_0 . File size is 2986 2016-11-10T13:38:54,206 INFO [LocalJobRunner Map Task Executor #0] OrcFileMergeOperator: Merged stripe from file pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/-ext-10004/000001_0 [ offset : 3 length: 2770 row: 500 ] 2016-11-10T13:38:54,215 INFO [LocalJobRunner Map Task Executor #0] exec.AbstractFileMergeOperator: renamed path pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/_task_tmp.-ext-10002/_tmp.000030_0 to pfile:/Users/sergey/git/hivegit2/itests/qtest/target/warehouse/.hive-staging_hive_2016-11-10_13-38-52_334_1323113125332102866-1/_tmp.-ext-10002/000030_0 . File size is 2986
This is because the test sets the max split size to 100. Merge jobs is supposed to override that, but that doesn't happen somehow.
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-18234 Hive MergeFileTask doesn't work correctly
- Resolved