[GIRAPH-800] Resolving mutations on a large graph causes timeouts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.1.0
Fix Version/s: None
Component/s: graph
Labels:
None
Environment:

hadoop1

Description

When processing a graph with a large number of mutations and/or a large number of messages per superstep, the pre-superstep logic can appear to be hung up and eventually the graph times out either because of mapreduce task inactivity or hitting the max superstep wait.

While its possible to tune around this by adding a strategic call to context.progress() in NettyServerWorker.resolveMutations() and bumping up the giraph.maxMasterSuperstepWaitMsecs setting, it would seem this part of the code might need some optimization.

As an example, in a graph with 2B vertices and 2.5B edges the transition between supersteps with 1B messages in flight can take 15-30 minutes on a cluster with 228 workers (2 threads, 8GB RAM per worker).

While the vertex resolve processing can be time consuming, I believe its the check for missing vertices (second loop within NettyServerWorker.resolveMutations()) that is the real performance bottleneck. I haven't identified a fix to this logic as of yet, but I did identify a possible workaround. I believe when dealing with a static and complete graph the resolveMutations() call can be skipped all together. A quick test of this theory yielded a 3x performance improvement in my sandbox.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

GIRAPH-800.patch
21/Nov/13 17:59
2 kB
Craig Muchinsky

Activity

People

Assignee:: Unassigned

Reporter:: Craig Muchinsky

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Nov/13 17:42

Updated:: 06/Jun/14 22:08