Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
1.0.2
-
None
-
None
Description
When running a spark streaming job, we should replicate receiver blocks, but all the blocks replicated to the same node. Here is the log.
14/09/10 19:55:16 INFO BlockManagerInfo: Added input-0-1410350117000 in memory on 10.196.131.19:42261 (size: 8.9 MB, free: 1050.3 MB)
14/09/10 19:55:16 INFO BlockManagerInfo: Added input-0-1410350117000 in memory on tdw-10-196-130-155:51155 (size: 8.9 MB, free: 879.3 MB)
14/09/10 19:55:17 INFO BlockManagerInfo: Added input-0-1410350118000 in memory on 10.196.131.19:42261 (size: 7.7 MB, free: 1042.6 MB)
14/09/10 19:55:17 INFO BlockManagerInfo: Added input-0-1410350118000 in memory on tdw-10-196-130-155:51155 (size: 7.7 MB, free: 871.6 MB)
14/09/10 19:55:18 INFO BlockManagerInfo: Added input-0-1410350119000 in memory on 10.196.131.19:42261 (size: 7.3 MB, free: 1035.3 MB)
14/09/10 19:55:18 INFO BlockManagerInfo: Added input-0-1410350119000 in memory on tdw-10-196-130-155:51155 (size: 7.3 MB, free: 864.3 MB)
The reason is when blockManagerSlave ask blockManagerMaster for a blockManagerId, blockManagerMaster always return the same blockManagerId. Here is the code:
private def getPeers(blockManagerId: BlockManagerId, size: Int): Seq[BlockManagerId] = {
val peers: Array[BlockManagerId] = blockManagerInfo.keySet.toArray
val selfIndex = peers.indexOf(blockManagerId)
if (selfIndex == -1)
// Note that this logic will select the same node multiple times if there aren't enough peers
Array.tabulate[BlockManagerId](size)
.toSeq
}
I think the blockManagerMaster should return the size of blockManagerId with more remain memory .
Attachments
Issue Links
- duplicates
-
SPARK-3495 Block replication fails continuously when the replication target node is dead
- Resolved