Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
0.1.0
-
None
-
None
Description
Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below:
Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings.
Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA.
Memory optimizations include:
- Clear the message list after computation
- Free vertex messages on the source as the flush is going on
- TreeMap -> HashMap for VertexMutations
- Sizing the ArrayList properly in transientInMessages
Attachments
Attachments
- GIRAPH-104.diff
- 35 kB
- Avery Ching
Activity
The reduction in the maximum amount of heap used for messaging during the life of an application is quite large. As an example, here's some runs I did prior to the optimizations:
2011-12-12 22:57:51,961 INFO org.apache.giraph.graph.BspServiceWorker: startSuperstep: Superstep - after prepare 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 122.46955M
2011-12-12 22:57:52,354 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: before flush - Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M
2011-12-12 22:57:52,354 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.091606M
2011-12-12 22:57:59,337 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M
2011-12-12 22:57:59,337 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 6 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.349098M
2011-12-12 22:58:01,403 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 4.156639M
2011-12-12 22:58:04,426 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep - after inMessage assignmnt 7 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 121.982346M
Note how the free memory would dip to 4 MB at times. After the fixes I don't see the dips:
2011-12-12 23:39:49,260 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 8 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.11537M
2011-12-12 23:39:49,274 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 110.102M
2011-12-12 23:39:49,458 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 103.08128M
2011-12-12 23:39:51,728 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M
2011-12-12 23:39:51,728 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 9 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 106.01724M
2011-12-12 23:39:51,747 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 105.48416M
2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.71583M
2011-12-12 23:39:51,786 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M
2011-12-12 23:39:51,786 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep 10 totalMem = 252.8125M, maxMem = 252.8125M, freeMem = 119.5272M
We should include this ASAP.
supposing the messaging pattern doesn't change between superstep 6 and superstep 8
this looks like a great improvement, great work. I went through the review, frankly quite quickly, and it looks very good.
I'll check it out better tomorrow and will +1.
Messaging pattern was from RandomMessageBenchmark (very regular). =) I was so happy to fix it and save a lot of messaging memory. I'll wait until your final review before committing. Thanks for taking a look!
By the way, here's example output from the changes to RandomMessageBenchmark. It will help us qualify messaging improvements.
2011-12-12 23:58:54,887 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Outputing statistics for superstep 4
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total bytes sent : 60000000000
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total bytes sent : 240000000000
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total messages : 6000000
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total messages : 24000000
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: superstep total millis : 854309
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: total millis : 3718123
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: workers : 5
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megabytes / second = 334.8932235547969
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second = 307.7921789267058
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second = 35116.09967821947
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second = 32274.349181024943
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep megaabytes / second / worker = 66.97864471095939
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total megabytes / second / worker = 61.55843578534116
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Superstep messages / second / worker = 7023.219935643894
2011-12-12 23:58:54,888 INFO org.apache.giraph.benchmark.RandomMessageBenchmark$RandomMessageBenchmarkWorkerContext: Total messages / second / worker = 6454.869836204989
2011-12-12 23:58:57,627 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep 4 totalMem = 20463.375M, maxMem = 20463.375M, freeMem = 6571.4233M
Went through it more carefully. Looks very clean, great work.
+1 from me.
Integrated in Giraph-trunk-Commit #47 (See https://builds.apache.org/job/Giraph-trunk-Commit/47/)
GIRAPH-104: Save half of maximum memory used from messaging. (aching)
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1214406
Files :
- /incubator/giraph/trunk/CHANGELOG
- /incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java
- /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
- /incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java
- /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
- /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
- /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java
- /incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3175/
-----------------------------------------------------------
Review request for giraph.
Summary
-------
Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below:
Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings.
Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA.
Memory optimizations include:
-Clear the message list after computation
-Free vertex messages on the source as the flush is going on
-TreeMap -> HashMap for VertexMutations
-Sizing the ArrayList properly in transientInMessages
This addresses bug
GIRAPH-104.https://issues.apache.org/jira/browse/GIRAPH-104
Diffs
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java 1213849
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1213849
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/LongSumAggregator.java 1213849
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1213849
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1213849
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerContext.java 1213849
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java PRE-CREATION
Diff: https://reviews.apache.org/r/3175/diff
Testing
-------
Passed local and Hadoop unittests. RandomMessageBenchmark was run at scale on a real cluster.
Thanks,
Avery