Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Abandoned
-
1.0.0
-
None
-
None
Description
During the input superstep, you can see the data for different regions being needlessly transferred across the network, instead of giving preference to machine-local regions if available.
On modest to large size graphs (5mil V 10mil E) we've noticed this causing resource contention, Zookeeper timeouts, and other issues that often freeze the input superstep until manually killed on the task tracker hosts.
This doesn't happen for TextVertexInputFormat subclasses. Perhaps it has to do with each instance of the HBaseVertexInputFormat subclass delegating to a private TableInputFormat instance.