Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-436

Improvement of partition selection for out-of-core graph

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: graph
    • Labels:
      None

      Description

      Currently, the OOC graph component needs a parameter K that describes how many partitions should be kept in memory out of N (the number of partitions assigned to that worker). Hence N - K partitions are constantly scanned and from/to disk.
      Now, as discussed on GIRAPH-249, for algorithms were the all the vertices are NOT always active, e.g. SSSP, this can have an impact on performance, as we might get to (a) scan a partition on disk with very few active vertices (b) scan a partition on disk with NO active vertices (c) scan a partition on disk with all active vertices while we have some partitions with NO active vertices in memory. The impact of OOC graph on other algorithms, e.g. PR-like algorithms, is less important, as the cost of hitting the disk is completely amortized (each IO byte is actually computed as all the vertices are active) and efficient (scan-based).

      For the affected class of algorithms, two contributions are possible:
      1) add a header to the partition file with the stats (a boolean hasActive field) about active vertices in the partition. If that partition hasn't received messages AND all the vertices are inactive, it can be skipped completely. Solves for problem (a). Cost: one random IO to the header at the end of each OOC partition computation.
      2) greedy heuristic that tries to keep in memory only K partitions, when possible, that have hasActive set to true. Once an inactive partition is spilled to disk, and it stays to disk, it won't be scanned/read back to memory unless necessary (works with contribution (1)). This accounts for problem (c).

      As far as (b) is concerned, there's not much one can do without breaking linear scanning.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              cmartella Claudio Martella
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: