Giraph
  1. Giraph
  2. GIRAPH-185

Improve concurrency of putMsg / putMsgList

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.0
    • Component/s: graph
    • Labels:
      None

      Description

      Currently in putMsg / putMsgList, a synchronized closure is used to protect the whole transientInMessages when adding the new message. This lock prevents other concurrent calls to putMsg/putMsgList and increases the response time. We should use fine-grain locks to allow high concurrency in message communication.

      1. GIRAPH-185.patch
        3 kB
        Bo Wang
      2. GIRAPH-185.patch
        7 kB
        Bo Wang
      3. GIRAPH-185.patch
        5 kB
        Bo Wang

        Issue Links

          Activity

          Hide
          Bo Wang added a comment -

          putVertexIdMessagesList can be improved together

          Show
          Bo Wang added a comment - putVertexIdMessagesList can be improved together
          Hide
          Claudio Martella added a comment -

          totally agree with that. I actually addressed that in GIRAPH-45, but it hasn't seen the light yet... So go on!

          Show
          Claudio Martella added a comment - totally agree with that. I actually addressed that in GIRAPH-45 , but it hasn't seen the light yet... So go on!
          Hide
          Bo Wang added a comment -

          Fixed the problem and wait for review.

          Show
          Bo Wang added a comment - Fixed the problem and wait for review.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4840/
          -----------------------------------------------------------

          Review request for giraph.

          Summary
          -------

          Improve concurrency of putMsg / putMsgList.

          This addresses bug GIRAPH-185.
          https://issues.apache.org/jira/browse/GIRAPH-185

          Diffs


          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747
          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1328747
          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1328747

          Diff: https://reviews.apache.org/r/4840/diff

          Testing
          -------

          All test passed. Wait for review. Thanks.

          Thanks,

          Bo

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4840/ ----------------------------------------------------------- Review request for giraph. Summary ------- Improve concurrency of putMsg / putMsgList. This addresses bug GIRAPH-185 . https://issues.apache.org/jira/browse/GIRAPH-185 Diffs http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1328747 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1328747 Diff: https://reviews.apache.org/r/4840/diff Testing ------- All test passed. Wait for review. Thanks. Thanks, Bo
          Hide
          Avery Ching added a comment -

          Thanks for looking at this Bo. I have a couple of questions/comments.

          1) Do you have any idea what kind of performance gain there is? Can you run a few experiments?

          2) The one downside is memory related. By pre-allocating a list for every vertex, we are going to use memory, whether the vertex will receive a message or not.

          I thought you might be looking into a higher level concurrency by using ConcurrentHashMap or something like that for transientInMessages?

          Show
          Avery Ching added a comment - Thanks for looking at this Bo. I have a couple of questions/comments. 1) Do you have any idea what kind of performance gain there is? Can you run a few experiments? 2) The one downside is memory related. By pre-allocating a list for every vertex, we are going to use memory, whether the vertex will receive a message or not. I thought you might be looking into a higher level concurrency by using ConcurrentHashMap or something like that for transientInMessages?
          Hide
          Claudio Martella added a comment -

          I agree with Avery. Last time I modified this code, I populated the transientInMessages datastructure so that it contained lists only for vertices that were receiving messages.
          You could do this by using a ConcurrentHashMap and by using the putIfAbsent() method.

          public void add(Object key, Object val) {
          Queue q = map.get(key);
          if (q == null)

          { q = new ConcurrentLinkedQueue(); Queue temp = map.putIfAbsent(q); if (temp != null) q = temp; }

          q.add(val);
          }

          This will populate the datastructure only for vertices that have messages AND by using the ConcurrentLinkedQueue we'd also get rid of the other synchronization block.

          What do you think?

          Show
          Claudio Martella added a comment - I agree with Avery. Last time I modified this code, I populated the transientInMessages datastructure so that it contained lists only for vertices that were receiving messages. You could do this by using a ConcurrentHashMap and by using the putIfAbsent() method. public void add(Object key, Object val) { Queue q = map.get(key); if (q == null) { q = new ConcurrentLinkedQueue(); Queue temp = map.putIfAbsent(q); if (temp != null) q = temp; } q.add(val); } This will populate the datastructure only for vertices that have messages AND by using the ConcurrentLinkedQueue we'd also get rid of the other synchronization block. What do you think?
          Hide
          Bo Wang added a comment -

          Thanks for the comments, Avery and Claudio.

          I am measuring the performance, but I found the #Sent messages (reported by
          counter) is always zero in both the original and changed version. Is that a
          bug? In the output of Quick Start Guide, it is also zero.

          I think ConcurrentHashMap is a good way to go. But for the current
          approach, I think the memory overhead won't be much since most of the
          vertices will receive messages except for those isolated ones and the one
          whose neighbors are all inactive. In comparison, ConcurrentHashMap is much
          larger than HashMap. An empty ConcurrentHashMap takes around 1673 bytes
          while an empty HashMap only takes 137 bytes.

          Another reason to use pre-population is to avoid the time spent on
          allocating new message lists and inserting them into the map. We just need
          to clear the list before the next super-step.

          What's your opinion?

          Thanks,
          Bo

          Show
          Bo Wang added a comment - Thanks for the comments, Avery and Claudio. I am measuring the performance, but I found the #Sent messages (reported by counter) is always zero in both the original and changed version. Is that a bug? In the output of Quick Start Guide, it is also zero. I think ConcurrentHashMap is a good way to go. But for the current approach, I think the memory overhead won't be much since most of the vertices will receive messages except for those isolated ones and the one whose neighbors are all inactive. In comparison, ConcurrentHashMap is much larger than HashMap. An empty ConcurrentHashMap takes around 1673 bytes while an empty HashMap only takes 137 bytes. Another reason to use pre-population is to avoid the time spent on allocating new message lists and inserting them into the map. We just need to clear the list before the next super-step. What's your opinion? Thanks, Bo
          Hide
          Avery Ching added a comment -

          Since we only allocate one ConcurrentHashMap per worker, the empty overhead isn't a concern. If however, the per element memory cost of a entry into the concurrent hash map is much more expensive then I would definitely be worried. We can also tune the concurrency level (default 16) to a reasonable tradeoff.

          Show
          Avery Ching added a comment - Since we only allocate one ConcurrentHashMap per worker, the empty overhead isn't a concern. If however, the per element memory cost of a entry into the concurrent hash map is much more expensive then I would definitely be worried. We can also tune the concurrency level (default 16) to a reasonable tradeoff.
          Hide
          Avery Ching added a comment -

          About the sent messages, it is the superstep sent messages =). Therefore if you are looking at the last superstep, there are 0 sent messages. If the supersteps are long enough, you will see this number change per superstep.

          Pre-population is certainly appears to be a tradeoff of memory vs performance. Can you run a few experiments to justify the tradeoff?

          Show
          Avery Ching added a comment - About the sent messages, it is the superstep sent messages =). Therefore if you are looking at the last superstep, there are 0 sent messages. If the supersteps are long enough, you will see this number change per superstep. Pre-population is certainly appears to be a tradeoff of memory vs performance. Can you run a few experiments to justify the tradeoff?
          Hide
          Avery Ching added a comment -

          Re:

          Thanks for the opinion, Avery.

          Yep, the space for ConcurrenHashMap is not a concern. I estimate the space
          overhead for those empty entries is not too large though. An empty entry is
          around 69 bytes, just the size of a couple of messages. Statistically
          speaking, most vertices will receive one or more messages, for example, in
          PageRank. Actually, each Vertex object also has an internal messageList
          structure of the same size, whether it receives a message or not. With
          pre-population the time for entry creation and insertion can be saved as
          well as the time spent on garbage collection.

          Do you think it's worth the trade-off? If not, I am pretty open to using
          ConcurrentHashMap.

          Thanks,
          Bo

          --------

          Bo, I added your comments from the email to the jira. I think there are two possible improvements here.

          1) Converting the HashMap to ConcurrentHashMap will increase concurrency.

          • This seems to be agreed on.
            2) Making sure that all the possible destinations have a message list.
          • I do agree that many applications will likely have most of the vertices receiving the messages. That being said, memory is one of the limitations of Giraph. Doubling up on the message lists is not likely to worth the benefit in performance (I could be wrong if you have benchmarks that say otherwise). Perhaps it might be possible to have the vertex not have a message list and instead use the one from inMessages. This would be a nice memory savings.
          Show
          Avery Ching added a comment - Re: Thanks for the opinion, Avery. Yep, the space for ConcurrenHashMap is not a concern. I estimate the space overhead for those empty entries is not too large though. An empty entry is around 69 bytes, just the size of a couple of messages. Statistically speaking, most vertices will receive one or more messages, for example, in PageRank. Actually, each Vertex object also has an internal messageList structure of the same size, whether it receives a message or not. With pre-population the time for entry creation and insertion can be saved as well as the time spent on garbage collection. Do you think it's worth the trade-off? If not, I am pretty open to using ConcurrentHashMap. Thanks, Bo -------- Bo, I added your comments from the email to the jira. I think there are two possible improvements here. 1) Converting the HashMap to ConcurrentHashMap will increase concurrency. This seems to be agreed on. 2) Making sure that all the possible destinations have a message list. I do agree that many applications will likely have most of the vertices receiving the messages. That being said, memory is one of the limitations of Giraph. Doubling up on the message lists is not likely to worth the benefit in performance (I could be wrong if you have benchmarks that say otherwise). Perhaps it might be possible to have the vertex not have a message list and instead use the one from inMessages. This would be a nice memory savings.
          Hide
          Bo Wang added a comment -

          Thanks for keeping up in the communication, Avery. I used putIfAbsent in the new version (thanks Claudio for the suggestion). Here comes the new patch.

          Show
          Bo Wang added a comment - Thanks for keeping up in the communication, Avery. I used putIfAbsent in the new version (thanks Claudio for the suggestion). Here comes the new patch.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4852/
          -----------------------------------------------------------

          Review request for giraph.

          Summary
          -------

          Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value.

          This addresses bug GIRAPH-185.
          https://issues.apache.org/jira/browse/GIRAPH-185

          Diffs


          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747

          Diff: https://reviews.apache.org/r/4852/diff

          Testing
          -------

          Thanks,

          Bo

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/ ----------------------------------------------------------- Review request for giraph. Summary ------- Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value. This addresses bug GIRAPH-185 . https://issues.apache.org/jira/browse/GIRAPH-185 Diffs http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 Diff: https://reviews.apache.org/r/4852/diff Testing ------- Thanks, Bo
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4852/#review7169
          -----------------------------------------------------------

          Looks good to me, wound't hurt to see some stress test to check performance, although I wouldn't expect this to be slower than the synchronized block. Also, I'd agree that moving the messages directly from the inMessages to the Vertex could be a memory win.

          • Claudio

          On 2012-04-24 06:11:38, Bo Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4852/

          -----------------------------------------------------------

          (Updated 2012-04-24 06:11:38)

          Review request for giraph.

          Summary

          -------

          Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value.

          This addresses bug GIRAPH-185.

          https://issues.apache.org/jira/browse/GIRAPH-185

          Diffs

          -----

          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747

          Diff: https://reviews.apache.org/r/4852/diff

          Testing

          -------

          Thanks,

          Bo

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/#review7169 ----------------------------------------------------------- Looks good to me, wound't hurt to see some stress test to check performance, although I wouldn't expect this to be slower than the synchronized block. Also, I'd agree that moving the messages directly from the inMessages to the Vertex could be a memory win. Claudio On 2012-04-24 06:11:38, Bo Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/ ----------------------------------------------------------- (Updated 2012-04-24 06:11:38) Review request for giraph. Summary ------- Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value. This addresses bug GIRAPH-185 . https://issues.apache.org/jira/browse/GIRAPH-185 Diffs ----- http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 Diff: https://reviews.apache.org/r/4852/diff Testing ------- Thanks, Bo
          Hide
          Bo Wang added a comment -

          Thanks for looking at this, Claudio. I may do some perf tests and post the results. I also found inMessages and vertex message list can be merged to save memory and time. I will create a Jira and work on it.

          Show
          Bo Wang added a comment - Thanks for looking at this, Claudio. I may do some perf tests and post the results. I also found inMessages and vertex message list can be merged to save memory and time. I will create a Jira and work on it.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4852/#review7185
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
          <https://reviews.apache.org/r/4852/#comment15860>

          Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList. I believe that linked list's will use more memory than the array list due to the double links (forward and backward). Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList? I haven't seen much on that.

          The concurrenthashmap changes look good.

          • Avery

          On 2012-04-24 06:11:38, Bo Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4852/

          -----------------------------------------------------------

          (Updated 2012-04-24 06:11:38)

          Review request for giraph.

          Summary

          -------

          Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value.

          This addresses bug GIRAPH-185.

          https://issues.apache.org/jira/browse/GIRAPH-185

          Diffs

          -----

          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747

          Diff: https://reviews.apache.org/r/4852/diff

          Testing

          -------

          Thanks,

          Bo

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/#review7185 ----------------------------------------------------------- http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java < https://reviews.apache.org/r/4852/#comment15860 > Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList. I believe that linked list's will use more memory than the array list due to the double links (forward and backward). Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList? I haven't seen much on that. The concurrenthashmap changes look good. Avery On 2012-04-24 06:11:38, Bo Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/ ----------------------------------------------------------- (Updated 2012-04-24 06:11:38) Review request for giraph. Summary ------- Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value. This addresses bug GIRAPH-185 . https://issues.apache.org/jira/browse/GIRAPH-185 Diffs ----- http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 Diff: https://reviews.apache.org/r/4852/diff Testing ------- Thanks, Bo
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-04-24 20:53:33, Avery Ching wrote:

          > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java, lines 776-777

          > <https://reviews.apache.org/r/4852/diff/1/?file=104059#file104059line776>

          >

          > Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList. I believe that linked list's will use more memory than the array list due to the double links (forward and backward). Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList? I haven't seen much on that.

          >

          > The concurrenthashmap changes look good.

          Avery, thanks for the comments. I just measured the sizes of these classes and below are an estimation.

          java.util.ArrayList: 149 bytes
          java.util.LinkedList: 101 bytes
          java.util.concurrent.ConcurrentLinkedQueue: 118 bytes

          The tool I was using is a program from the link below.
          http://www.javapractices.com/topic/TopicAction.do?Id=83

          In terms of performance, here is a benchmark.
          http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html

          In its test #1 (adding element), ConcurrentLinkedQueue performed slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed ConcurrentLinkedQueue. I think the most time consuming part is add, while iteration is also heavily used but no concurrent accesses.

          • Bo

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4852/#review7185
          -----------------------------------------------------------

          On 2012-04-24 06:11:38, Bo Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4852/

          -----------------------------------------------------------

          (Updated 2012-04-24 06:11:38)

          Review request for giraph.

          Summary

          -------

          Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value.

          This addresses bug GIRAPH-185.

          https://issues.apache.org/jira/browse/GIRAPH-185

          Diffs

          -----

          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747

          Diff: https://reviews.apache.org/r/4852/diff

          Testing

          -------

          Thanks,

          Bo

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-04-24 20:53:33, Avery Ching wrote: > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java , lines 776-777 > < https://reviews.apache.org/r/4852/diff/1/?file=104059#file104059line776 > > > Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList. I believe that linked list's will use more memory than the array list due to the double links (forward and backward). Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList? I haven't seen much on that. > > The concurrenthashmap changes look good. Avery, thanks for the comments. I just measured the sizes of these classes and below are an estimation. java.util.ArrayList: 149 bytes java.util.LinkedList: 101 bytes java.util.concurrent.ConcurrentLinkedQueue: 118 bytes The tool I was using is a program from the link below. http://www.javapractices.com/topic/TopicAction.do?Id=83 In terms of performance, here is a benchmark. http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html In its test #1 (adding element), ConcurrentLinkedQueue performed slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed ConcurrentLinkedQueue. I think the most time consuming part is add, while iteration is also heavily used but no concurrent accesses. Bo ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/#review7185 ----------------------------------------------------------- On 2012-04-24 06:11:38, Bo Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/ ----------------------------------------------------------- (Updated 2012-04-24 06:11:38) Review request for giraph. Summary ------- Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value. This addresses bug GIRAPH-185 . https://issues.apache.org/jira/browse/GIRAPH-185 Diffs ----- http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 Diff: https://reviews.apache.org/r/4852/diff Testing ------- Thanks, Bo
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-04-24 20:53:33, Avery Ching wrote:

          > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java, lines 776-777

          > <https://reviews.apache.org/r/4852/diff/1/?file=104059#file104059line776>

          >

          > Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList. I believe that linked list's will use more memory than the array list due to the double links (forward and backward). Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList? I haven't seen much on that.

          >

          > The concurrenthashmap changes look good.

          Bo Wang wrote:

          Avery, thanks for the comments. I just measured the sizes of these classes and below are an estimation.

          java.util.ArrayList: 149 bytes

          java.util.LinkedList: 101 bytes

          java.util.concurrent.ConcurrentLinkedQueue: 118 bytes

          The tool I was using is a program from the link below.

          http://www.javapractices.com/topic/TopicAction.do?Id=83

          In terms of performance, here is a benchmark.

          http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html

          In its test #1 (adding element), ConcurrentLinkedQueue performed slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed ConcurrentLinkedQueue. I think the most time consuming part is add, while iteration is also heavily used but no concurrent accesses.

          Thanks for the response Bo.

          Those numbers are for the empty data structures I'm assuming. I was referring to the incremental cost of adding elements (messages) to the data structures. The performance isn't a a concern to me (unless we call size() somewhere).

          • Avery

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4852/#review7185
          -----------------------------------------------------------

          On 2012-04-24 06:11:38, Bo Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4852/

          -----------------------------------------------------------

          (Updated 2012-04-24 06:11:38)

          Review request for giraph.

          Summary

          -------

          Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value.

          This addresses bug GIRAPH-185.

          https://issues.apache.org/jira/browse/GIRAPH-185

          Diffs

          -----

          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747

          Diff: https://reviews.apache.org/r/4852/diff

          Testing

          -------

          Thanks,

          Bo

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-04-24 20:53:33, Avery Ching wrote: > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java , lines 776-777 > < https://reviews.apache.org/r/4852/diff/1/?file=104059#file104059line776 > > > Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList. I believe that linked list's will use more memory than the array list due to the double links (forward and backward). Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList? I haven't seen much on that. > > The concurrenthashmap changes look good. Bo Wang wrote: Avery, thanks for the comments. I just measured the sizes of these classes and below are an estimation. java.util.ArrayList: 149 bytes java.util.LinkedList: 101 bytes java.util.concurrent.ConcurrentLinkedQueue: 118 bytes The tool I was using is a program from the link below. http://www.javapractices.com/topic/TopicAction.do?Id=83 In terms of performance, here is a benchmark. http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html In its test #1 (adding element), ConcurrentLinkedQueue performed slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed ConcurrentLinkedQueue. I think the most time consuming part is add, while iteration is also heavily used but no concurrent accesses. Thanks for the response Bo. Those numbers are for the empty data structures I'm assuming. I was referring to the incremental cost of adding elements (messages) to the data structures. The performance isn't a a concern to me (unless we call size() somewhere). Avery ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/#review7185 ----------------------------------------------------------- On 2012-04-24 06:11:38, Bo Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/ ----------------------------------------------------------- (Updated 2012-04-24 06:11:38) Review request for giraph. Summary ------- Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value. This addresses bug GIRAPH-185 . https://issues.apache.org/jira/browse/GIRAPH-185 Diffs ----- http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 Diff: https://reviews.apache.org/r/4852/diff Testing ------- Thanks, Bo
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-04-24 20:53:33, Avery Ching wrote:

          > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java, lines 776-777

          > <https://reviews.apache.org/r/4852/diff/1/?file=104059#file104059line776>

          >

          > Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList. I believe that linked list's will use more memory than the array list due to the double links (forward and backward). Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList? I haven't seen much on that.

          >

          > The concurrenthashmap changes look good.

          Bo Wang wrote:

          Avery, thanks for the comments. I just measured the sizes of these classes and below are an estimation.

          java.util.ArrayList: 149 bytes

          java.util.LinkedList: 101 bytes

          java.util.concurrent.ConcurrentLinkedQueue: 118 bytes

          The tool I was using is a program from the link below.

          http://www.javapractices.com/topic/TopicAction.do?Id=83

          In terms of performance, here is a benchmark.

          http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html

          In its test #1 (adding element), ConcurrentLinkedQueue performed slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed ConcurrentLinkedQueue. I think the most time consuming part is add, while iteration is also heavily used but no concurrent accesses.

          Avery Ching wrote:

          Thanks for the response Bo.

          Those numbers are for the empty data structures I'm assuming. I was referring to the incremental cost of adding elements (messages) to the data structures. The performance isn't a a concern to me (unless we call size() somewhere).

          By the incremental cost, I mean the memory cost, sorry.

          • Avery

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4852/#review7185
          -----------------------------------------------------------

          On 2012-04-24 06:11:38, Bo Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4852/

          -----------------------------------------------------------

          (Updated 2012-04-24 06:11:38)

          Review request for giraph.

          Summary

          -------

          Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value.

          This addresses bug GIRAPH-185.

          https://issues.apache.org/jira/browse/GIRAPH-185

          Diffs

          -----

          http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747

          Diff: https://reviews.apache.org/r/4852/diff

          Testing

          -------

          Thanks,

          Bo

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-04-24 20:53:33, Avery Ching wrote: > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java , lines 776-777 > < https://reviews.apache.org/r/4852/diff/1/?file=104059#file104059line776 > > > Bo, I'm a little leery about converting the List and ArrayList to LinkedList and ConcurrentLinkedList. I believe that linked list's will use more memory than the array list due to the double links (forward and backward). Also, is ConcurrentLinkedList supposted to outperform a synchronized ArrayList? I haven't seen much on that. > > The concurrenthashmap changes look good. Bo Wang wrote: Avery, thanks for the comments. I just measured the sizes of these classes and below are an estimation. java.util.ArrayList: 149 bytes java.util.LinkedList: 101 bytes java.util.concurrent.ConcurrentLinkedQueue: 118 bytes The tool I was using is a program from the link below. http://www.javapractices.com/topic/TopicAction.do?Id=83 In terms of performance, here is a benchmark. http://www.javacodegeeks.com/2010/09/java-best-practices-queue-battle-and.html In its test #1 (adding element), ConcurrentLinkedQueue performed slightly better than LinkedList. In test #3 (iterator), LinkedList outperformed ConcurrentLinkedQueue. I think the most time consuming part is add, while iteration is also heavily used but no concurrent accesses. Avery Ching wrote: Thanks for the response Bo. Those numbers are for the empty data structures I'm assuming. I was referring to the incremental cost of adding elements (messages) to the data structures. The performance isn't a a concern to me (unless we call size() somewhere). By the incremental cost, I mean the memory cost, sorry. Avery ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/#review7185 ----------------------------------------------------------- On 2012-04-24 06:11:38, Bo Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4852/ ----------------------------------------------------------- (Updated 2012-04-24 06:11:38) Review request for giraph. Summary ------- Use ConcurrentHashMap and ConcurrentLinkedQueue to allow concurrent assess to message map. The concurrencyLevel of ConcurrentHashMap uses the default value. There may be some performance gain by tuning this value. This addresses bug GIRAPH-185 . https://issues.apache.org/jira/browse/GIRAPH-185 Diffs ----- http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1328747 Diff: https://reviews.apache.org/r/4852/diff Testing ------- Thanks, Bo
          Hide
          Claudio Martella added a comment -

          The performance of concurrentlinkedqueue is going to be faster than a synchronized block as it's just a CAS operation on the tail pointer, at least for the add() method which adds to the tail of the queue. Also, arrayList in any case should be slower on adding elements as it requires the memory expansion and copying when the allocated memory is exhausted.
          Iteration could indeed be a bit slower than an arrayList because of cache.

          The memory overhead of each entry of the queue is indeed something that should be investigated. Worst case, one might think of copying the concurrentlinkedqueue implementation and remove the "prev" pointer which we don't need.

          Show
          Claudio Martella added a comment - The performance of concurrentlinkedqueue is going to be faster than a synchronized block as it's just a CAS operation on the tail pointer, at least for the add() method which adds to the tail of the queue. Also, arrayList in any case should be slower on adding elements as it requires the memory expansion and copying when the allocated memory is exhausted. Iteration could indeed be a bit slower than an arrayList because of cache. The memory overhead of each entry of the queue is indeed something that should be investigated. Worst case, one might think of copying the concurrentlinkedqueue implementation and remove the "prev" pointer which we don't need.
          Hide
          Claudio Martella added a comment -

          Actually I checked the source and there's no prev pointer, each Node has just a pointer to the payload and to the next node. The memory overhead should be small.

          Show
          Claudio Martella added a comment - Actually I checked the source and there's no prev pointer, each Node has just a pointer to the payload and to the next node. The memory overhead should be small.
          Hide
          Bo Wang added a comment -

          I checked the source and found the same thing. I think LinkedList should be ok in terms of space. ArrayList also has to keep empty space in the array to future insertion. Should we close this issue?

          Show
          Bo Wang added a comment - I checked the source and found the same thing. I think LinkedList should be ok in terms of space. ArrayList also has to keep empty space in the array to future insertion. Should we close this issue?
          Hide
          Claudio Martella added a comment -

          Personally, I'd like to see some benchmarking on this issue. If we commit this, we should have an idea of the impact.

          Show
          Claudio Martella added a comment - Personally, I'd like to see some benchmarking on this issue. If we commit this, we should have an idea of the impact.
          Hide
          Avery Ching added a comment -

          I agree that a benchmark should be done, although I expect the impact to be very small. We should at least show it's not slower. =)

          Show
          Avery Ching added a comment - I agree that a benchmark should be done, although I expect the impact to be very small. We should at least show it's not slower. =)
          Hide
          Hyunsik Choi added a comment -

          If there is a trade-off relationship between the performance and memory consumption, the memory consumption seems more important in the current giraph implementation. Also, I agree that some benchmarks are necessary.

          Show
          Hyunsik Choi added a comment - If there is a trade-off relationship between the performance and memory consumption, the memory consumption seems more important in the current giraph implementation. Also, I agree that some benchmarks are necessary.
          Hide
          Claudio Martella added a comment -

          I think we showed there shouldn't be any additional memory with this patch. Let's see what performance gain we get. I also expect very little impact, but let's see.

          Show
          Claudio Martella added a comment - I think we showed there shouldn't be any additional memory with this patch. Let's see what performance gain we get. I also expect very little impact, but let's see.
          Hide
          Bo Wang added a comment -

          I did a small test (2 machines, each with 2 cores) and the performance
          change is relatively small. I am now scheduling a time to run it on a
          larger environment so that there are more conflicts. There are always too
          many concurrent jobs on the cluster and I need to wait for a time slot.
          Sorry for having your waiting.

          Show
          Bo Wang added a comment - I did a small test (2 machines, each with 2 cores) and the performance change is relatively small. I am now scheduling a time to run it on a larger environment so that there are more conflicts. There are always too many concurrent jobs on the cluster and I need to wait for a time slot. Sorry for having your waiting.
          Hide
          Claudio Martella added a comment -

          If I can add something, I'd run two tests:

          (1) Pagerank with the given randomgraph generator in the current benchmark class. I'd create a very small graph (100 vertices) with many edges. The complexity of PR is O(E) for message creation. This should give us a "realistic" test, as often O(E) is the case.
          (2) Dummy algorithm that with a small graph creates 10k (just an example) messages for each other vertex in the graph, per superstep. This should create a lot of messages per worker (again, 100 vertices in the graph should be ok). This one should put more pressure on the datastructure.

          If want me to test it on a bigger cluster (we have 16 machines here), just upload the code and I'll run it for you.

          Show
          Claudio Martella added a comment - If I can add something, I'd run two tests: (1) Pagerank with the given randomgraph generator in the current benchmark class. I'd create a very small graph (100 vertices) with many edges. The complexity of PR is O(E) for message creation. This should give us a "realistic" test, as often O(E) is the case. (2) Dummy algorithm that with a small graph creates 10k (just an example) messages for each other vertex in the graph, per superstep. This should create a lot of messages per worker (again, 100 vertices in the graph should be ok). This one should put more pressure on the datastructure. If want me to test it on a bigger cluster (we have 16 machines here), just upload the code and I'll run it for you.
          Hide
          Bo Wang added a comment -

          Sorry for replying late. I've been pretty busy last week with midterms and deadlines. I did a few tests on a 24-core 32GB DRAM machine.

          Quite astonishing, the performance of the new version (ConcurrentHashMap + ConcurrentLinkedQueue) is kind of slower than the original version. I ran the PageRankBenchmark with "-e 100 -s 10 -V 10000". Results are as following.

          (original)
          #cores superstep.exec.time
          12 11580
          8 10651
          4 12944
          2 19334
          1 20782

          (ConcurrentHashMap + ConcurrentLinkedQueue)
          #cores superstep.exec.time
          12 11639
          8 11527
          4 13653
          2 20354
          1 27825

          I think the overhead came from the lock/unlock on the get(), especially when reading messages sent in the previous stage where no lock is needed since it's sequential.

          To verify my assumption, I changed the ConcurrentLinkedQueue back to ArrayList and wrap it with synchronized, then the performance does improve.

          (ConcurrentHashMap)
          #cores superstep.exec.time
          12 11380
          8 11618
          4 12834
          2 18417
          1 22952

          In comparison, it's more or less the same as the original version. Then I ran with another set of parameters "-e 1000 -s 10 -V 1000". It seems the fine grain lock does help.

          (original)
          #cores superstep.exec.time
          12 97101
          6 11321

          (ConcurrentHashMap)
          #cores superstep.exec.time
          12 92848 (4.4%)
          6 10834 (4.3%)

          I attached the new patch (ConcurrentHashMap version) for the review.

          Claudio, thanks for offering to help. You may checkout a clean version and apply the patch ($ patch -p0 -i GIRAPH-185.patch) to test it on the cluster.

          Show
          Bo Wang added a comment - Sorry for replying late. I've been pretty busy last week with midterms and deadlines. I did a few tests on a 24-core 32GB DRAM machine. Quite astonishing, the performance of the new version (ConcurrentHashMap + ConcurrentLinkedQueue) is kind of slower than the original version. I ran the PageRankBenchmark with "-e 100 -s 10 -V 10000". Results are as following. (original) #cores superstep.exec.time 12 11580 8 10651 4 12944 2 19334 1 20782 (ConcurrentHashMap + ConcurrentLinkedQueue) #cores superstep.exec.time 12 11639 8 11527 4 13653 2 20354 1 27825 I think the overhead came from the lock/unlock on the get(), especially when reading messages sent in the previous stage where no lock is needed since it's sequential. To verify my assumption, I changed the ConcurrentLinkedQueue back to ArrayList and wrap it with synchronized, then the performance does improve. (ConcurrentHashMap) #cores superstep.exec.time 12 11380 8 11618 4 12834 2 18417 1 22952 In comparison, it's more or less the same as the original version. Then I ran with another set of parameters "-e 1000 -s 10 -V 1000". It seems the fine grain lock does help. (original) #cores superstep.exec.time 12 97101 6 11321 (ConcurrentHashMap) #cores superstep.exec.time 12 92848 (4.4%) 6 10834 (4.3%) I attached the new patch (ConcurrentHashMap version) for the review. Claudio, thanks for offering to help. You may checkout a clean version and apply the patch ($ patch -p0 -i GIRAPH-185 .patch) to test it on the cluster.

            People

            • Assignee:
              Bo Wang
              Reporter:
              Bo Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development