Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2189

Snappy compression of message batches less efficient in 0.8.2.1

    Details

      Description

      We are using snappy compression and noticed a fairly substantial increase (about 2.25x) in log filesystem space consumption after upgrading a Kafka cluster from 0.8.1.1 to 0.8.2.1. We found that this is caused by messages being seemingly recompressed individually (or possibly with a much smaller buffer or dictionary?) instead of as a batch as sent by producers. We eventually tracked down the change in compression ratio/scope to this [1] commit that updated the snappy version from 1.0.5 to 1.1.1.3. The Kafka client version does not appear to be relevant as we can reproduce this with both the 0.8.1.1 and 0.8.2.1 Producer.

      Here are the log files from our troubleshooting that contain the same set of 1000 messages, for batch sizes of 1, 10, 100, and 1000. f9d9b was the last commit with 0.8.1.1-like behavior prior to f5ab8 introducing the issue.

      -rw-rw-r-- 1 kafka kafka 404967 May 12 11:45 /var/kafka2/f9d9b-batch-1-0/00000000000000000000.log
      -rw-rw-r-- 1 kafka kafka 119951 May 12 11:45 /var/kafka2/f9d9b-batch-10-0/00000000000000000000.log
      -rw-rw-r-- 1 kafka kafka  89645 May 12 11:45 /var/kafka2/f9d9b-batch-100-0/00000000000000000000.log
      -rw-rw-r-- 1 kafka kafka  88279 May 12 11:45 /var/kafka2/f9d9b-batch-1000-0/00000000000000000000.log
      
      -rw-rw-r-- 1 kafka kafka 402837 May 12 11:41 /var/kafka2/f5ab8-batch-1-0/00000000000000000000.log
      -rw-rw-r-- 1 kafka kafka 382437 May 12 11:41 /var/kafka2/f5ab8-batch-10-0/00000000000000000000.log
      -rw-rw-r-- 1 kafka kafka 364791 May 12 11:41 /var/kafka2/f5ab8-batch-100-0/00000000000000000000.log
      -rw-rw-r-- 1 kafka kafka 380693 May 12 11:41 /var/kafka2/f5ab8-batch-1000-0/00000000000000000000.log
      

      [1] https://github.com/apache/kafka/commit/f5ab8e1780cf80f267906e3259ad4f9278c32d28

      1. KAFKA-2189.patch
        0.9 kB
        Ismael Juma

        Activity

        Hide
        AOLSON1@CERNER.COM Olson,Andrew added a comment -

        I have verified that the issue [1] was introduced in snappy-java 1.1.1.2 and has already been fixed [2], in snappy-java 1.1.1.7.

        [1] https://github.com/xerial/snappy-java/issues/100
        [2] https://github.com/xerial/snappy-java/commit/dc2dd27f85e5167961883f71ac2681b73b33e5df

        Show
        AOLSON1@CERNER.COM Olson,Andrew added a comment - I have verified that the issue [1] was introduced in snappy-java 1.1.1.2 and has already been fixed [2] , in snappy-java 1.1.1.7. [1] https://github.com/xerial/snappy-java/issues/100 [2] https://github.com/xerial/snappy-java/commit/dc2dd27f85e5167961883f71ac2681b73b33e5df
        Hide
        ijuma Ismael Juma added a comment -

        Created reviewboard https://reviews.apache.org/r/34144/diff/
        against branch upstream/trunk

        Show
        ijuma Ismael Juma added a comment - Created reviewboard https://reviews.apache.org/r/34144/diff/ against branch upstream/trunk
        Hide
        AOLSON1@CERNER.COM Olson,Andrew added a comment -

        Everything looks good so far in our development environment, the log file sizes were returned back to the 0.8.1.1 baseline when we replaced 1.1.1.6 with 1.1.1.7 in the broker libs. Producer/broker performance was the same or in some cases better.

        Should be moving this change into production within the next couple of days to free up some disk space, will update again once we have been running 1.1.1.7 in prod for a few days.

        Show
        AOLSON1@CERNER.COM Olson,Andrew added a comment - Everything looks good so far in our development environment, the log file sizes were returned back to the 0.8.1.1 baseline when we replaced 1.1.1.6 with 1.1.1.7 in the broker libs. Producer/broker performance was the same or in some cases better. Should be moving this change into production within the next couple of days to free up some disk space, will update again once we have been running 1.1.1.7 in prod for a few days.
        Hide
        AOLSON1@CERNER.COM Olson,Andrew added a comment -

        We've been running with snappy-java 1.1.1.7 in production for four days now with no issues.

        Show
        AOLSON1@CERNER.COM Olson,Andrew added a comment - We've been running with snappy-java 1.1.1.7 in production for four days now with no issues.
        Hide
        ijuma Ismael Juma added a comment -

        Given the positive feedback, it seems like it would be good to get this merged so that more people can test it before the final release?

        Show
        ijuma Ismael Juma added a comment - Given the positive feedback, it seems like it would be good to get this merged so that more people can test it before the final release?
        Hide
        junrao Jun Rao added a comment -

        Thanks for the patch. +1 and committed to trunk.

        Show
        junrao Jun Rao added a comment - Thanks for the patch. +1 and committed to trunk.
        Hide
        ottomata Andrew Otto added a comment -

        Hi all,

        The Wikimedia Foundation had a serious production issue when we upgraded to 0.8.2.1 because of this bug. Snappy compression doesn't work at scale in 0.8.2.1. I know 0.8.3 is slated for release soon, maybe you should consider doing a 0.8.2.2 release just to get this out there in a stable tag, so that others don't run into this issue.

        Show
        ottomata Andrew Otto added a comment - Hi all, The Wikimedia Foundation had a serious production issue when we upgraded to 0.8.2.1 because of this bug. Snappy compression doesn't work at scale in 0.8.2.1. I know 0.8.3 is slated for release soon, maybe you should consider doing a 0.8.2.2 release just to get this out there in a stable tag, so that others don't run into this issue.
        Hide
        jshaw86 Jordan Shaw added a comment -

        Hi all,
        I was wondering if this affects only 0.8.2.1 or also 0.8.2? We are on 0.8.2 and just did a complete rebalance across our brokers and some brokers are at 70% disk utilization and some are at 30%. Thanks.

        Show
        jshaw86 Jordan Shaw added a comment - Hi all, I was wondering if this affects only 0.8.2.1 or also 0.8.2? We are on 0.8.2 and just did a complete rebalance across our brokers and some brokers are at 70% disk utilization and some are at 30%. Thanks.
        Hide
        junrao Jun Rao added a comment -

        0.8.2.0 has the same problem since it depends on snappy-java 1.1.1.6.

        Show
        junrao Jun Rao added a comment - 0.8.2.0 has the same problem since it depends on snappy-java 1.1.1.6.
        Hide
        jbrosenberg@gmail.com Jason Rosenberg added a comment -

        quick question, I assume there should be no issues with upgrading brokers and/or producers/consumers independently with this change? E.g. can snappy 1.1.1.6 and 1.1.1.7 interoperate without any compatibility issues?

        Show
        jbrosenberg@gmail.com Jason Rosenberg added a comment - quick question, I assume there should be no issues with upgrading brokers and/or producers/consumers independently with this change? E.g. can snappy 1.1.1.6 and 1.1.1.7 interoperate without any compatibility issues?
        Hide
        noslowerdna Andrew Olson added a comment -

        Jason Rosenberg Yes your assumption is correct, this was only an efficiency/chunking issue and not any protocol incompatibility.

        Show
        noslowerdna Andrew Olson added a comment - Jason Rosenberg Yes your assumption is correct, this was only an efficiency/chunking issue and not any protocol incompatibility.

          People

          • Assignee:
            ijuma Ismael Juma
            Reporter:
            AOLSON1@CERNER.COM Olson,Andrew
            Reviewer:
            Jun Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development