Hama
  1. Hama
  2. HAMA-596

Optimize memory usage of graph job

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6.0
    • Component/s: graph
    • Labels:
      None

      Description

      This somewhat problematic.

      1. HAMA-596.patch
        17 kB
        Thomas Jungblut
      2. mapToList.patch
        10 kB
        Edward J. Yoon

        Issue Links

          Activity

          Hide
          Thomas Jungblut added a comment - - edited

          +1. The disk based stuff will be done in HAMA-642

          Show
          Thomas Jungblut added a comment - - edited +1. The disk based stuff will be done in HAMA-642
          Hide
          Edward J. Yoon added a comment -

          We don't need to use map.

          Show
          Edward J. Yoon added a comment - We don't need to use map.
          Hide
          Thomas Jungblut added a comment -

          Obviously, the hashmap that contains the vertex is consuming the most memory.
          If you dig deeper into the vertex, you see that the edges are consuming most of the space.
          That was during partitioning.
          After partitioning, it gets even worse, because of the real messaging going on.
          At the end, for a 70mb textfile it used about 600mb of graph. That is still way too much. And plus 400mb of messages. = 1gb. That is 14 times the size of the raw file.

          So how can we cut down the cost of the hashmap and of the edges. Best would be to solve it with HAMA-642, but I think this will degrade performance totally.

          [1] http://wiki.apache.org/hama/WriteHamaGraphFile#Google_Web_dataset_.28local_mode.2C_pseudo_distributed_cluser.29

          Show
          Thomas Jungblut added a comment - Obviously, the hashmap that contains the vertex is consuming the most memory. If you dig deeper into the vertex, you see that the edges are consuming most of the space. That was during partitioning. After partitioning, it gets even worse, because of the real messaging going on. At the end, for a 70mb textfile it used about 600mb of graph. That is still way too much. And plus 400mb of messages. = 1gb. That is 14 times the size of the raw file. So how can we cut down the cost of the hashmap and of the edges. Best would be to solve it with HAMA-642 , but I think this will degrade performance totally. [1] http://wiki.apache.org/hama/WriteHamaGraphFile#Google_Web_dataset_.28local_mode.2C_pseudo_distributed_cluser.29
          Hide
          Thomas Jungblut added a comment -

          The cost field of Edges are in most algorithms null, which wasting lots of spaces.

          21mio. edges ~ 180mb only for null cost fields.

          Show
          Thomas Jungblut added a comment - The cost field of Edges are in most algorithms null, which wasting lots of spaces. 21mio. edges ~ 180mb only for null cost fields.
          Hide
          Hudson added a comment -

          Integrated in Hama-Nightly #672 (See https://builds.apache.org/job/Hama-Nightly/672/)
          HAMA-596:Optimize memory usage of graph job (Revision 1383326)

          Result = SUCCESS
          tjungblut :
          Files :

          • /hama/trunk/CHANGES.txt
          • /hama/trunk/core/src/main/java/org/apache/hama/bsp/BSPPeerImpl.java
          • /hama/trunk/core/src/main/java/org/apache/hama/bsp/LocalBSPRunner.java
          • /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/MemoryQueue.java
          • /hama/trunk/graph/src/main/java/org/apache/hama/graph/Edge.java
          • /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobMessage.java
          • /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobRunner.java
          • /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobRunnerBase.java
          • /hama/trunk/graph/src/main/java/org/apache/hama/graph/Vertex.java
          Show
          Hudson added a comment - Integrated in Hama-Nightly #672 (See https://builds.apache.org/job/Hama-Nightly/672/ ) HAMA-596 :Optimize memory usage of graph job (Revision 1383326) Result = SUCCESS tjungblut : Files : /hama/trunk/CHANGES.txt /hama/trunk/core/src/main/java/org/apache/hama/bsp/BSPPeerImpl.java /hama/trunk/core/src/main/java/org/apache/hama/bsp/LocalBSPRunner.java /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/MemoryQueue.java /hama/trunk/graph/src/main/java/org/apache/hama/graph/Edge.java /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobMessage.java /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobRunner.java /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobRunnerBase.java /hama/trunk/graph/src/main/java/org/apache/hama/graph/Vertex.java
          Hide
          Thomas Jungblut added a comment -

          Committed, thanks for the review Edward.

          Show
          Thomas Jungblut added a comment - Committed, thanks for the review Edward.
          Hide
          Thomas Jungblut added a comment -

          -MemoryQueue should not be a linked list rather than an arraylist.

          But otherwise it looks okay.

          Show
          Thomas Jungblut added a comment - -MemoryQueue should not be a linked list rather than an arraylist. But otherwise it looks okay.
          Hide
          Edward J. Yoon added a comment -

          +1

          Show
          Edward J. Yoon added a comment - +1
          Hide
          Thomas Jungblut added a comment -

          -I removed the object reference of the peer and combined it with the jobrunner
          -The destination peer adress will now be resolved when needed and not saved as a field

          Testcases pass. Please review, I will profile later to check if it got better and if there are any hotspots left.

          Note that I have added some new constructs in the localrunner, because it hung to infinity when a single task failed.

          Show
          Thomas Jungblut added a comment - -I removed the object reference of the peer and combined it with the jobrunner -The destination peer adress will now be resolved when needed and not saved as a field Testcases pass. Please review, I will profile later to check if it got better and if there are any hotspots left. Note that I have added some new constructs in the localrunner, because it hung to infinity when a single task failed.
          Hide
          Thomas Jungblut added a comment - - edited

          from HAMA-598:

          In regards to memory usage, do you think it is neeeded to store the destination peer address on every vertex? Especially when it comes to fault tolerance where these values can change easily. Actually it would be better to calculate them on the fly based on the given partitioner.

          I think this is the most memory costly variable. I will profile later on with the wikipedia dataset again.

          Show
          Thomas Jungblut added a comment - - edited from HAMA-598 : In regards to memory usage, do you think it is neeeded to store the destination peer address on every vertex? Especially when it comes to fault tolerance where these values can change easily. Actually it would be better to calculate them on the fly based on the given partitioner. I think this is the most memory costly variable. I will profile later on with the wikipedia dataset again.
          Hide
          Edward J. Yoon added a comment -

          Graph examples without runtime partitioning throws NullPointerExceptions. Will be fixed on HAMA-598

          Show
          Edward J. Yoon added a comment - Graph examples without runtime partitioning throws NullPointerExceptions. Will be fixed on HAMA-598
          Hide
          Edward J. Yoon added a comment -

          I didn't look at 'runtime partitioning' code closely, but I think it'll need to be optimized.

          If a lot of data exchanges among peers are expected, they should be transferred through multi-steps.

          Show
          Edward J. Yoon added a comment - I didn't look at 'runtime partitioning' code closely, but I think it'll need to be optimized. If a lot of data exchanges among peers are expected, they should be transferred through multi-steps.

            People

            • Assignee:
              Thomas Jungblut
              Reporter:
              Edward J. Yoon
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development