Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-11

Improve the graph distribution of Giraph

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.1.0
    • 0.1.0
    • None
    • None

    Description

      Currently, Giraph assumes that the data from the VertexInputFormat is sorted. If the user data is not sorted by the vertex id, they must first run a MapReduce or Pig job to generate a sorted dataset. This is often a bit inconvenient.

      Giraph graph partitioning is currently range based and there are some advantages and disadvantages of this approach. The proposal of this JIRA would be to allow for both range and hash based partitioning and provide more flexibility to the user.

      Design goals for the graph distribution:

      • Allow vertices to be unordered or unordered
      • Ability to repartition
      • Select the partitioning scheme based on user needs (i.e. hash or range based)
      • Ability to provide user-specific hints about partitions

      Hash-based partitioning

      • Good vertex balancing across ranges for random data
      • Bad at vertex id locality

      Range-based partitioning

      • Good at vertex id locality
      • Ability to split ranges easily
      • Can cause hotspots for hot ranges

      Attachments

        1. GIRAPH-11.4.diff
          412 kB
          Avery Ching
        2. GIRAPH-11.3.diff
          413 kB
          Avery Ching
        3. GIRAPH-11.2.diff
          412 kB
          Avery Ching
        4. GIRAPH-11.diff
          396 kB
          Avery Ching

        Issue Links

          Activity

            People

              aching Avery Ching
              aching Avery Ching
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: