Chukwa
  1. Chukwa
  2. CHUKWA-664

network compression between agent and collector

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 0.5.0, 0.6.0
    • Fix Version/s: 0.6.0
    • Component/s: Data Collection
    • Labels:
      None

      Description

      As suggested in http://mail-archives.apache.org/mod_mbox/incubator-chukwa-user/201207.mbox/%3C001b01cd69b4$13d9c100$3b8d4300$@com%3E , Chukwa should be able to compress network communications between agent and collector.

      1. chukwa-664.patch
        9 kB
        Sourygna Luangsay
      2. chukwa-664-2.patch
        10 kB
        Sourygna Luangsay

        Activity

        Hide
        Hudson added a comment -

        Integrated in Chukwa-trunk #460 (See https://builds.apache.org/job/Chukwa-trunk/460/)
        CHUKWA-664. Added network compression between agent and collector. (Sourygna Luangsay via Eric Yang) (Revision 1411817)

        Result = SUCCESS
        eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411817
        Files :

        • /incubator/chukwa/trunk/CHANGES.txt
        • /incubator/chukwa/trunk/conf/chukwa-common.xml
        • /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/agent/ChukwaAgent.java
        • /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/ServletCollector.java
        • /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/sender/ChukwaHttpSender.java
        Show
        Hudson added a comment - Integrated in Chukwa-trunk #460 (See https://builds.apache.org/job/Chukwa-trunk/460/ ) CHUKWA-664 . Added network compression between agent and collector. (Sourygna Luangsay via Eric Yang) (Revision 1411817) Result = SUCCESS eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411817 Files : /incubator/chukwa/trunk/CHANGES.txt /incubator/chukwa/trunk/conf/chukwa-common.xml /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/agent/ChukwaAgent.java /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/collector/servlet/ServletCollector.java /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/datacollection/sender/ChukwaHttpSender.java
        Hide
        Eric Yang added a comment -

        I just committed this, thanks Sourygna.

        Show
        Eric Yang added a comment - I just committed this, thanks Sourygna.
        Hide
        Eric Yang added a comment -

        Any update on the junit test case?

        Show
        Eric Yang added a comment - Any update on the junit test case?
        Hide
        Sourygna Luangsay added a comment -

        Submitted the patch that fixes HTTP POST.

        So, the actions that remain are (I am going soon abroad for 3 weeks so I won't be able to work on that Jira untill I come back on 15th of Octoter):

        • checking if we can use "io.file.buffer.size parameter" if native-hadoop library is not loaded.
        • writing some junit tests
        • update the documentation
        Show
        Sourygna Luangsay added a comment - Submitted the patch that fixes HTTP POST. So, the actions that remain are (I am going soon abroad for 3 weeks so I won't be able to work on that Jira untill I come back on 15th of Octoter): checking if we can use "io.file.buffer.size parameter" if native-hadoop library is not loaded. writing some junit tests update the documentation
        Hide
        Sourygna Luangsay added a comment -

        OK for the POST: I had to modify getContentLength() in BuffersRequestEntity class to make it work. Tonight, I have to refactor a bit my code, test it and I'll submit the new patch.

        Show
        Sourygna Luangsay added a comment - OK for the POST: I had to modify getContentLength() in BuffersRequestEntity class to make it work. Tonight, I have to refactor a bit my code, test it and I'll submit the new patch.
        Hide
        Eric Yang added a comment -

        HTTP POST header should be visible. I don't see any reason in the implementation that would cause it to be omitted. A test case to validate the data are sent correctly would be nice.

        Show
        Eric Yang added a comment - HTTP POST header should be visible. I don't see any reason in the implementation that would cause it to be omitted. A test case to validate the data are sent correctly would be nice.
        Hide
        Sourygna Luangsay added a comment -

        You were right: too many flushes due to compression.
        Nonetheless, I don't think that compression is called for every chunk. I only call compressionOutputStream.finish() method once every chunks are written on the stream.

        I have played a bit with Hadoop "io.file.buffer.size parameter" and I managed to get bigger (and less numerous) TCP fragments if I increase this buffer variable. So I guess that would fix the TCP incast problem (I have also tried changing the "chukwaAgent.fileTailingAdaptor.maxReadSize" parameter and got more interesting results).
        The only trouble is that "io.file.buffer.size parameter" currently only works if you load the native-hadoop library for compression. I have got to have a better look at my code and Hadoop compression package and see if I can enable it if native-hadoop is not loaded.

        And do you have any idea why HTTP POST can't be seen when compression is enabled?

        Show
        Sourygna Luangsay added a comment - You were right: too many flushes due to compression. Nonetheless, I don't think that compression is called for every chunk. I only call compressionOutputStream.finish() method once every chunks are written on the stream. I have played a bit with Hadoop "io.file.buffer.size parameter" and I managed to get bigger (and less numerous) TCP fragments if I increase this buffer variable. So I guess that would fix the TCP incast problem (I have also tried changing the "chukwaAgent.fileTailingAdaptor.maxReadSize" parameter and got more interesting results). The only trouble is that "io.file.buffer.size parameter" currently only works if you load the native-hadoop library for compression. I have got to have a better look at my code and Hadoop compression package and see if I can enable it if native-hadoop is not loaded. And do you have any idea why HTTP POST can't be seen when compression is enabled?
        Hide
        Eric Yang added a comment -

        When compression is enabled, flush is called on every chunk. When it is not compressed, flush is not called. Flush is the cause of increased TCP fragments. When agent to collector subscribing ratio is too high, increased TCP fragments can cause excessive retransmission under high load conditions and leading to tcp incast problem. A chunk is typically very small, and we don't need to flush immediately. This would save number of TCP headers to send. Collector would provide HTTP response code to agent if re-transmit of the last set of chunks is necessary. Therefore, it is best to let TCP buffer fill up then send data. This will help the throughput rate for compressed data stream for the current patch.

        Show
        Eric Yang added a comment - When compression is enabled, flush is called on every chunk. When it is not compressed, flush is not called. Flush is the cause of increased TCP fragments. When agent to collector subscribing ratio is too high, increased TCP fragments can cause excessive retransmission under high load conditions and leading to tcp incast problem. A chunk is typically very small, and we don't need to flush immediately. This would save number of TCP headers to send. Collector would provide HTTP response code to agent if re-transmit of the last set of chunks is necessary. Therefore, it is best to let TCP buffer fill up then send data. This will help the throughput rate for compressed data stream for the current patch.
        Hide
        Sourygna Luangsay added a comment -

        Here is a first patch that adds the compression feature. I have tried it with DefaultCodec, GzipCodec, BZip2Codec and it seems OK.
        If it looks good, I'll submit another patch with the current changes in the documentation.

        Though everything seems to work, there is something that I don't really understand. I have tcpdumped network traffic and when compression is enabled, I can't see any more the HTTP POST protocol appearing in Wireshark. I just see various TCP segments that hold my Chukwa chunk, but I can't see any higher protocol than TCP. What is more, using the same file (19KB) to compare between compress (size compress: 7.6 KB) and uncompressed tcpdumps, I have noticed that I get more (smaller) TCP segments with compressed communication than with uncompressed communication. Could someone enlightens me?

        Show
        Sourygna Luangsay added a comment - Here is a first patch that adds the compression feature. I have tried it with DefaultCodec, GzipCodec, BZip2Codec and it seems OK. If it looks good, I'll submit another patch with the current changes in the documentation. Though everything seems to work, there is something that I don't really understand. I have tcpdumped network traffic and when compression is enabled, I can't see any more the HTTP POST protocol appearing in Wireshark. I just see various TCP segments that hold my Chukwa chunk, but I can't see any higher protocol than TCP. What is more, using the same file (19KB) to compare between compress (size compress: 7.6 KB) and uncompressed tcpdumps, I have noticed that I get more (smaller) TCP segments with compressed communication than with uncompressed communication. Could someone enlightens me?

          People

          • Assignee:
            Sourygna Luangsay
            Reporter:
            Sourygna Luangsay
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development