Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-104

Save half of maximum memory used from messaging

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.1.0
    • 0.1.0
    • None
    • None

    Description

      Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below:

      Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings.

      Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA.

      Memory optimizations include:

      • Clear the message list after computation
      • Free vertex messages on the source as the flush is going on
      • TreeMap -> HashMap for VertexMutations
      • Sizing the ArrayList properly in transientInMessages

      Attachments

        1. GIRAPH-104.diff
          35 kB
          Avery Ching

        Activity

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3175/
          -----------------------------------------------------------

          Review request for giraph.

          Summary
          -------

          Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below:

          Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings.

          Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA.

          Memory optimizations include:

          -Clear the message list after computation
          -Free vertex messages on the source as the flush is going on
          -TreeMap -> HashMap for VertexMutations
          -Sizing the ArrayList properly in transientInMessages

          This addresses bug GIRAPH-104.
          https://issues.apache.org/jira/browse/GIRAPH-104

          Diffs


          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java 1213849
          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1213849
          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java 1213849
          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1213849
          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1213849
          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java 1213849
          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java PRE-CREATION

          Diff: https://reviews.apache.org/r/3175/diff

          Testing
          -------

          Passed local and Hadoop unittests. RandomMessageBenchmark was run at scale on a real cluster.

          Thanks,

          Avery

          jiraposter@reviews.apache.org jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3175/ ----------------------------------------------------------- Review request for giraph. Summary ------- Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below: Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings. Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA. Memory optimizations include: -Clear the message list after computation -Free vertex messages on the source as the flush is going on -TreeMap -> HashMap for VertexMutations -Sizing the ArrayList properly in transientInMessages This addresses bug GIRAPH-104 . https://issues.apache.org/jira/browse/GIRAPH-104 Diffs http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java 1213849 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java PRE-CREATION Diff: https://reviews.apache.org/r/3175/diff Testing ------- Passed local and Hadoop unittests. RandomMessageBenchmark was run at scale on a real cluster. Thanks, Avery
          aching Avery Ching added a comment -

          The reduction in the maximum amount of heap used for messaging during the life of an application is quite large. As an example, here's some runs I did prior to the optimizations:

          2011-12-12 22:57:51,961 INFO org.apache.giraph.graph.BspServiceWorker: startSuperstep: Superstep - after prepare 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 122.46955M
          2011-12-12 22:57:52,354 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: before flush - Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M
          2011-12-12 22:57:52,354 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M
          2011-12-12 22:57:59,337 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M
          2011-12-12 22:57:59,337 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M
          2011-12-12 22:58:01,403 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.156639M
          2011-12-12 22:58:04,426 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep - after inMessage assignmnt 7 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 121.982346M

          Note how the free memory would dip to 4 MB at times. After the fixes I don't see the dips:

          2011-12-12 23:39:49,260 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 8 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.11537M
          2011-12-12 23:39:49,274 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.102M
          2011-12-12 23:39:49,458 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 103.08128M
          2011-12-12 23:39:51,728 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M
          2011-12-12 23:39:51,728 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M
          2011-12-12 23:39:51,747 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 105.48416M
          2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.71583M
          2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M
          2011-12-12 23:39:51,786 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M

          We should include this ASAP.

          aching Avery Ching added a comment - The reduction in the maximum amount of heap used for messaging during the life of an application is quite large. As an example, here's some runs I did prior to the optimizations: 2011-12-12 22:57:51,961 INFO org.apache.giraph.graph.BspServiceWorker: startSuperstep: Superstep - after prepare 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 122.46955M 2011-12-12 22:57:52,354 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: before flush - Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M 2011-12-12 22:57:52,354 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M 2011-12-12 22:57:59,337 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M 2011-12-12 22:57:59,337 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M 2011-12-12 22:58:01,403 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.156639M 2011-12-12 22:58:04,426 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep - after inMessage assignmnt 7 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 121.982346M Note how the free memory would dip to 4 MB at times. After the fixes I don't see the dips: 2011-12-12 23:39:49,260 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 8 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.11537M 2011-12-12 23:39:49,274 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.102M 2011-12-12 23:39:49,458 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 103.08128M 2011-12-12 23:39:51,728 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M 2011-12-12 23:39:51,728 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M 2011-12-12 23:39:51,747 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 105.48416M 2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.71583M 2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M 2011-12-12 23:39:51,786 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M We should include this ASAP.

          supposing the messaging pattern doesn't change between superstep 6 and superstep 8

          this looks like a great improvement, great work. I went through the review, frankly quite quickly, and it looks very good.

          I'll check it out better tomorrow and will +1.

          cmartella Claudio Martella added a comment - supposing the messaging pattern doesn't change between superstep 6 and superstep 8 this looks like a great improvement, great work. I went through the review, frankly quite quickly, and it looks very good. I'll check it out better tomorrow and will +1.
          aching Avery Ching added a comment -

          Messaging pattern was from RandomMessageBenchmark (very regular). =) I was so happy to fix it and save a lot of messaging memory. I'll wait until your final review before committing. Thanks for taking a look!

          aching Avery Ching added a comment - Messaging pattern was from RandomMessageBenchmark (very regular). =) I was so happy to fix it and save a lot of messaging memory. I'll wait until your final review before committing. Thanks for taking a look!
          aching Avery Ching added a comment -

          By the way, here's example output from the changes to RandomMessageBenchmark. It will help us qualify messaging improvements.

          2011-12-12 23:58:54,887 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Outputing statistics for superstep 4
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total bytes sent : 60000000000
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total bytes sent : 240000000000
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total messages : 6000000
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total messages : 24000000
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total millis : 854309
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total millis : 3718123
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: workers : 5
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megabytes / second = 334.8932235547969
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second = 307.7921789267058
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second = 35116.09967821947
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second = 32274.349181024943
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megaabytes / second / worker = 66.97864471095939
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second / worker = 61.55843578534116
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second / worker = 7023.219935643894
          2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second / worker = 6454.869836204989
          2011-12-12 23:58:57,627 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 4 totalMem = 20463.375M, maxMem = 20463.375M, freeMem = 6571.4233M

          aching Avery Ching added a comment - By the way, here's example output from the changes to RandomMessageBenchmark. It will help us qualify messaging improvements. 2011-12-12 23:58:54,887 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Outputing statistics for superstep 4 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total bytes sent : 60000000000 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total bytes sent : 240000000000 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total messages : 6000000 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total messages : 24000000 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total millis : 854309 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total millis : 3718123 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: workers : 5 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megabytes / second = 334.8932235547969 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second = 307.7921789267058 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second = 35116.09967821947 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second = 32274.349181024943 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megaabytes / second / worker = 66.97864471095939 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second / worker = 61.55843578534116 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second / worker = 7023.219935643894 2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second / worker = 6454.869836204989 2011-12-12 23:58:57,627 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 4 totalMem = 20463.375M, maxMem = 20463.375M, freeMem = 6571.4233M

          Went through it more carefully. Looks very clean, great work.

          +1 from me.

          cmartella Claudio Martella added a comment - Went through it more carefully. Looks very clean, great work. +1 from me.
          hudson Hudson added a comment -

          Integrated in Giraph-trunk-Commit #47 (See https://builds.apache.org/job/Giraph-trunk-Commit/47/)
          GIRAPH-104: Save half of maximum memory used from messaging. (aching)

          aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1214406
          Files :

          • /incubator/giraph/trunk/CHANGELOG
          • /incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java
          • /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
          • /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java
          • /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
          • /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
          • /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java
          • /incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java
          hudson Hudson added a comment - Integrated in Giraph-trunk-Commit #47 (See https://builds.apache.org/job/Giraph-trunk-Commit/47/ ) GIRAPH-104 : Save half of maximum memory used from messaging. (aching) aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1214406 Files : /incubator/giraph/trunk/CHANGELOG /incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java /incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java
          aching Avery Ching added a comment -

          Thanks for the quick review Claudio! Onto GIRAPH-57...

          aching Avery Ching added a comment - Thanks for the quick review Claudio! Onto GIRAPH-57 ...

          People

            aching Avery Ching
            aching Avery Ching
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: