Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.5.0
-
None
Description
Lots of errors like this when running apps with the external shuffle service and kerberos enabled:
15/08/05 06:26:18 WARN TaskSetManager: Lost task 2.0 in stage 2.0 (TID 12, spark-nightly-2.vpc.cloudera.com): FetchFailed(BlockManagerId(2, spark-nightly-2.vpc.cloudera.com, 7337), shuffleId=0, mapId=0, reduceId=2, message= org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Failed to open file: /yarn/nm/usercache/systest/appcache/application_1438780049118_0008/blockmgr-7178b106-6902-4082-8792-1c3e34b80d15/38/shuffle_0_0_0.index at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:203) at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:113) at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:80) at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
This is caused by commit c4830598 (SPARK-6287), which modified the permissions of the directory storing the shuffle files.
Attachments
Issue Links
- links to