Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-1048

Redesign of out-of-core mechanism (first patch -- out-of-core mechanism keeping fixed number of partitions in memory)

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:

      Description

      The current out-of-core mechanism implemented in Giraph suffers from a few issues:

      • It does not integrate well with a flow-control mechanism in which rate of incoming/outgoing messages are controlled according to available memory,
      • It does not control data generation/processing rate by compute/input threads, which is crucial in input superstep, and also compute supersteps in some applications,
      • It does not utilize the disk bandwidth properly due to concurrent disk accesses (IO interference),
      • It suffers from high overhead due to successive manual GC calls, even when the high-memory pressure cannot be addressed by offloading data to disk,
      • And yet, it has a complicated design making it difficult to debug and improve upon.
      • It is very difficult to try different out-of-core policies, making it impossible to tune the mechanism.

      A simple to tune/program, flexible, and yet efficient out-of-core infrastructure is needed in Giraph. In this JIRA we propose a redesign of out-of-core mechanism, in which a) the logic of IO operations, b) the logic of out-of-core decisions, c) data-structures supporting out-of-core operations, and d) the actual logic for the computation are 4 different decoupled entities. Some IOCommands and an IOScheduler address the logic behind IO operations, an OutOfCoreEngine and a MetaPartitionManager address the logic for out-of-core decisions, several disk-backed data-structures are responsible to keep necessary data, and finally, the old in-memory computation mechanism interact with the out-of-core infrastructure seamlessly.

      This JIRA is created to set the ground for the out-of-core infrastructure, and as an initial proof-of-concept, a simple out-of-core policy using the mentioned infrastructure is implemented. The out-of-core policy in this JIRA, also called fixed out-of-core policy, tries to keep a certain (user defined) number of partitions in memory.

        Attachments

          Activity

            People

            • Assignee:
              heslami Hassan Eslami
              Reporter:
              heslami Hassan Eslami
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: