Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I ran Hive 0.14 on Tez 0.5.2 and master with MemToMemMerger disabled - this configuration change is the diff between TEZ-1807 and this JIRA. Data size is 100GB texts generated by RandomTextWriter.
create external table randomText100GB( text string ) location 'hdfs:///user/ozawa/randomText100GB'; CREATE TABLE wordcount AS SELECT word, count(1) AS count FROM (SELECT EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' ')) AS word FROM randomText100GB) words GROUP BY word;
As a result, an exception is thrown:
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 ......... KILLED 115 110 0 5 0 5
Reducer 2 FAILED 3 0 0 3 1 2
--------------------------------------------------------------------------------
VERTICES: 00/02 [========================>>--] 93% ELAPSED TIME: 110.95 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1417036912823_0073_1_01, diagnostics=[Task failed, taskId=task_1417036912823_0073_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: exceptionThrown=org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in DiskToDiskMerger [Map_1]
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:338)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:319)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged (File name too long)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:211)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:207)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:270)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
at org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.<init>(IFile.java:129)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$OnDiskMerger.merge(MergeManager.java:702)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
, errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in DiskToDiskMerger [Map_1]
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:338)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:319)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged (File name too long)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:211)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:207)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:270)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
at org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.<init>(IFile.java:129)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$OnDiskMerger.merge(MergeManager.java:702)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1417036912823_0073_1_01 [Reducer 2] killed/failed due to:null]
Vertex killed, vertexName=Map 1, vertexId=vertex_1417036912823_0073_1_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1417036912823_0073_1_00 [Map 1] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
The log message of this line looks strange:
Caused by: java.io.FileNotFoundException: /hadoop1/tmp/nm-local-dir/usercache/ozawa/appcache/application_1417036912823_0073/attempt_1417036912823_0073_1_01_000000_0_10026_spill_215.out.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged.merged (File name too long)