[GIRAPH-1048] Redesign of out-of-core mechanism (first patch -- out-of-core mechanism keeping fixed number of partitions in memory) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: None
Labels:
- out-of-memory

Description

The current out-of-core mechanism implemented in Giraph suffers from a few issues:

It does not integrate well with a flow-control mechanism in which rate of incoming/outgoing messages are controlled according to available memory,
It does not control data generation/processing rate by compute/input threads, which is crucial in input superstep, and also compute supersteps in some applications,
It does not utilize the disk bandwidth properly due to concurrent disk accesses (IO interference),
It suffers from high overhead due to successive manual GC calls, even when the high-memory pressure cannot be addressed by offloading data to disk,
And yet, it has a complicated design making it difficult to debug and improve upon.
It is very difficult to try different out-of-core policies, making it impossible to tune the mechanism.

A simple to tune/program, flexible, and yet efficient out-of-core infrastructure is needed in Giraph. In this JIRA we propose a redesign of out-of-core mechanism, in which a) the logic of IO operations, b) the logic of out-of-core decisions, c) data-structures supporting out-of-core operations, and d) the actual logic for the computation are 4 different decoupled entities. Some IOCommands and an IOScheduler address the logic behind IO operations, an OutOfCoreEngine and a MetaPartitionManager address the logic for out-of-core decisions, several disk-backed data-structures are responsible to keep necessary data, and finally, the old in-memory computation mechanism interact with the out-of-core infrastructure seamlessly.

This JIRA is created to set the ground for the out-of-core infrastructure, and as an initial proof-of-concept, a simple out-of-core policy using the mentioned infrastructure is implemented. The out-of-core policy in this JIRA, also called fixed out-of-core policy, tries to keep a certain (user defined) number of partitions in memory.

Attachments

Activity

People

Assignee:: Hassan Eslami

Reporter:: Hassan Eslami

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 15/Mar/16 02:29

Updated:: 14/Oct/16 00:58

Resolved:: 15/Mar/16 17:50