Giraph
  1. Giraph
  2. GIRAPH-104

Save half of maximum memory used from messaging

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.1.0
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below:

      Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings.

      Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA.

      Memory optimizations include:

      • Clear the message list after computation
      • Free vertex messages on the source as the flush is going on
      • TreeMap -> HashMap for VertexMutations
      • Sizing the ArrayList properly in transientInMessages
      1. GIRAPH-104.diff
        35 kB
        Avery Ching

        Activity

        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3175/
        -----------------------------------------------------------

        Review request for giraph.

        Summary
        -------

        Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below:

        Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings.

        Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA.

        Memory optimizations include:

        -Clear the message list after computation
        -Free vertex messages on the source as the flush is going on
        -TreeMap -> HashMap for VertexMutations
        -Sizing the ArrayList properly in transientInMessages

        This addresses bug GIRAPH-104.
        https://issues.apache.org/jira/browse/GIRAPH-104

        Diffs


        http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java 1213849
        http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1213849
        http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java 1213849
        http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1213849
        http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1213849
        http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java 1213849
        http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3175/diff

        Testing
        -------

        Passed local and Hadoop unittests. RandomMessageBenchmark was run at scale on a real cluster.

        Thanks,

        Avery

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3175/ ----------------------------------------------------------- Review request for giraph. Summary ------- Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below: Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings. Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA. Memory optimizations include: -Clear the message list after computation -Free vertex messages on the source as the flush is going on -TreeMap -> HashMap for VertexMutations -Sizing the ArrayList properly in transientInMessages This addresses bug GIRAPH-104 . https://issues.apache.org/jira/browse/GIRAPH-104 Diffs http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java PRE-CREATION Diff: https://reviews.apache.org/r/3175/diff Testing ------- Passed local and Hadoop unittests. RandomMessageBenchmark was run at scale on a real cluster. Thanks, Avery
        Hide
        Avery Ching added a comment -

        The reduction in the maximum amount of heap used for messaging during the life of an application is quite large. As an example, here's some runs I did prior to the optimizations:

        2011-12-12 22:57:51,961 INFO org.apache.giraph.graph.BspServiceWorker: startSuperstep: Superstep - after prepare 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 122.46955M
        2011-12-12 22:57:52,354 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: before flush - Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M
        2011-12-12 22:57:52,354 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M
        2011-12-12 22:57:59,337 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M
        2011-12-12 22:57:59,337 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M
        2011-12-12 22:58:01,403 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.156639M
        2011-12-12 22:58:04,426 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep - after inMessage assignmnt 7 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 121.982346M

        Note how the free memory would dip to 4 MB at times. After the fixes I don't see the dips:

        2011-12-12 23:39:49,260 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 8 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.11537M
        2011-12-12 23:39:49,274 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.102M
        2011-12-12 23:39:49,458 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 103.08128M
        2011-12-12 23:39:51,728 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M
        2011-12-12 23:39:51,728 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M
        2011-12-12 23:39:51,747 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 105.48416M
        2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.71583M
        2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M
        2011-12-12 23:39:51,786 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M

        We should include this ASAP.

        Show
        Avery Ching added a comment - The reduction in the maximum amount of heap used for messaging during the life of an application is quite large. As an example, here's some runs I did prior to the optimizations: 2011-12-12 22:57:51,961 INFO org.apache.giraph.graph.BspServiceWorker: startSuperstep: Superstep - after prepare 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 122.46955M 2011-12-12 22:57:52,354 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: before flush - Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M 2011-12-12 22:57:52,354 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M 2011-12-12 22:57:59,337 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M 2011-12-12 22:57:59,337 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M 2011-12-12 22:58:01,403 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.156639M 2011-12-12 22:58:04,426 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep - after inMessage assignmnt 7 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 121.982346M Note how the free memory would dip to 4 MB at times. After the fixes I don't see the dips: 2011-12-12 23:39:49,260 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 8 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.11537M 2011-12-12 23:39:49,274 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.102M 2011-12-12 23:39:49,458 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 103.08128M 2011-12-12 23:39:51,728 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M 2011-12-12 23:39:51,728 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M 2011-12-12 23:39:51,747 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 105.48416M 2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.71583M 2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M 2011-12-12 23:39:51,786 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M We should include this ASAP.
        Hide
        Claudio Martella added a comment -

        supposing the messaging pattern doesn't change between superstep 6 and superstep 8

        this looks like a great improvement, great work. I went through the review, frankly quite quickly, and it looks very good.

        I'll check it out better tomorrow and will +1.

        Show
        Claudio Martella added a comment - supposing the messaging pattern doesn't change between superstep 6 and superstep 8 this looks like a great improvement, great work. I went through the review, frankly quite quickly, and it looks very good. I'll check it out better tomorrow and will +1.
        Hide
        Avery Ching added a comment -

        Messaging pattern was from RandomMessageBenchmark (very regular). =) I was so happy to fix it and save a lot of messaging memory. I'll wait until your final review before committing. Thanks for taking a look!

        Show
        Avery Ching added a comment - Messaging pattern was from RandomMessageBenchmark (very regular). =) I was so happy to fix it and save a lot of messaging memory. I'll wait until your final review before committing. Thanks for taking a look!
        Hide
        Avery Ching added a comment -

        By the way, here's example output from the changes to RandomMessageBenchmark. It will help us qualify messaging improvements.

        2011-12-12 23:58:54,887 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Outputing statistics for superstep 4
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total bytes sent : 60000000000
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total bytes sent : 240000000000
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total messages : 6000000
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total messages : 24000000
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total millis : 854309
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total millis : 3718123
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: workers : 5
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megabytes / second = 334.8932235547969
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second = 307.7921789267058
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second = 35116.09967821947
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second = 32274.349181024943
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megaabytes / second / worker = 66.97864471095939
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second / worker = 61.55843578534116
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second / worker = 7023.219935643894
        2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second / worker = 6454.869836204989
        2011-12-12 23:58:57,627 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 4 totalMem = 20463.375M, maxMem = 20463.375M, freeMem = 6571.4233M

        Show
        Avery Ching added a comment - By the way, here's example output from the changes to RandomMessageBenchmark. It will help us qualify messaging improvements. 2011-12-12 23:58:54,887 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Outputing statistics for superstep 4 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total bytes sent : 60000000000 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total bytes sent : 240000000000 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total messages : 6000000 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total messages : 24000000 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total millis : 854309 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total millis : 3718123 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: workers : 5 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megabytes / second = 334.8932235547969 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second = 307.7921789267058 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second = 35116.09967821947 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second = 32274.349181024943 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megaabytes / second / worker = 66.97864471095939 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second / worker = 61.55843578534116 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second / worker = 7023.219935643894 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second / worker = 6454.869836204989 2011-12-12 23:58:57,627 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 4 totalMem = 20463.375M, maxMem = 20463.375M, freeMem = 6571.4233M
        Hide
        Claudio Martella added a comment -

        Went through it more carefully. Looks very clean, great work.

        +1 from me.

        Show
        Claudio Martella added a comment - Went through it more carefully. Looks very clean, great work. +1 from me.
        Hide
        Hudson added a comment -

        Integrated in Giraph-trunk-Commit #47 (See https://builds.apache.org/job/Giraph-trunk-Commit/47/)
        GIRAPH-104: Save half of maximum memory used from messaging. (aching)

        aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1214406
        Files :

        • /incubator/giraph/trunk/CHANGELOG
        • /incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java
        • /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
        • /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java
        • /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
        • /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
        • /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java
        • /incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java
        Show
        Hudson added a comment - Integrated in Giraph-trunk-Commit #47 (See https://builds.apache.org/job/Giraph-trunk-Commit/47/ ) GIRAPH-104 : Save half of maximum memory used from messaging. (aching) aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1214406 Files : /incubator/giraph/trunk/CHANGELOG /incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java
        Hide
        Avery Ching added a comment -

        Thanks for the quick review Claudio! Onto GIRAPH-57...

        Show
        Avery Ching added a comment - Thanks for the quick review Claudio! Onto GIRAPH-57 ...

          People

          • Assignee:
            Avery Ching
            Reporter:
            Avery Ching
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development