Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Cannot Reproduce
-
2.6.0
-
None
-
None
-
HDP 2.2
Description
HDFS mover gets stuck looping on a block that fails to move and doesn't migrate the rest of the blocks.
This is preventing recovery of data from a decomissioning external storage tier used for archive (we've had problems with that proprietary "hyperscale" storage product which is why a couple blocks here and there have checksum problems or premature eof as shown below), but this should not prevent moving all the other blocks to recover our data:
hdfs mover -p /apps/hive/warehouse/<custom_scrubbed> 15/05/07 14:52:50 INFO mover.Mover: namenodes = {hdfs://nameservice1=[/apps/hive/warehouse/<custom_scrubbed>]} 15/05/07 14:52:51 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/05/07 14:52:51 INFO block.BlockTokenSecretManager: Setting block keys 15/05/07 14:52:51 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 15/05/07 14:52:52 INFO block.BlockTokenSecretManager: Setting block keys 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:52:52 WARN balancer.Dispatcher: Failed to move blk_1075156654_1438349 with size=134217728 from <ip>:1019:ARCHIVE to <ip>:1019:DISK through <ip>:1019: block move is failed: opReplaceBlock BP-120244285-<ip>-1417023863606:blk_1075156654_1438349 received exception java.io.EOFException: Premature EOF: no length prefix available <NOW IT STARTS LOOPING ON SAME BLOCK> 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: /default-rack/<ip>:1019 15/05/07 14:53:31 WARN balancer.Dispatcher: Failed to move blk_1075156654_1438349 with size=134217728 from <ip>:1019:ARCHIVE to <ip>:1019:DISK through <ip>:1019: block move is failed: opReplaceBlock BP-120244285-<ip>-1417023863606:blk_1075156654_1438349 received exception java.io.EOFException: Premature EOF: no length prefix available ...<repeat indefinitely>...