[GIRAPH-247] Introduce edge based partitioning for InputSplits - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Later
Affects Version/s: 1.0.0
Fix Version/s: 1.0.0
Component/s: graph
Labels:
- patch

Description

Experiments on larger data input sets while maintaining low memory profile has revealed that typical social graph data is very lumpy and partitioning by vertices can easily overload some unlucky worker nodes who end up with partitions containing highly-connected vertices while other nodes process partitions with the same number of vertices but far fewer out-edges per vertex. This often results in cascading failures during data load-in even on tiny data sets.

By partitioning using edges (the default I set in GiraphJob.MAX_EDGES_PER_PARTITION_DEFAULT is 200,000 per partition, or the old default # of vertices, whichever the user's input format reaches first when reading InputSplits) I have seen dramatic "de-lumpification" of data, allow the processing of 8x larger data sets before memory problems occur at a given configuration setting.

This needs more tuning, but comes with a -Dgiraph.maxEdgesPerPartition that can be set to more edges/partition as your data sets grow or memory limitations shrink. This might be considered a first attempt, perhaps simply allowing us to default to this type of partitioning or the old version would be more compatible with existing users' needs? That would not be a hard feature to add to this. But I think this method of partition production has merit for typical large-scale graph data that Giraph is designed to process.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

GIRAPH-247-3.patch
12/Jul/12 18:12
3 kB
Eli Reisman
GIRAPH-247-2.patch
12/Jul/12 17:50
3 kB
Eli Reisman
GIRAPH-247-1.patch
11/Jul/12 21:19
3 kB
Eli Reisman

Issue Links

relates to

GIRAPH-249 Move part of the graph out-of-core when memory is low

Resolved

Activity

People

Assignee:: Eli Reisman

Reporter:: Eli Reisman

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Jul/12 21:19

Updated:: 14/Jul/12 20:14

Resolved:: 14/Jul/12 20:14

Agile

View on Board

Introduce edge based partitioning for InputSplits

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment