Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-388

Log compaction on checkpoint topics fails with compression

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: kafka
    • Labels:
      None

      Description

      I have a job that has 10,000+ partitions that it's consuming from. After SAMZA-123, it's been switched to use the GroupBySystemStreamPartition strategy, which means it's got 10,000+ tasks, and thus, 10,000+ checkpoint messages being sent every minute.

      To keep the checkpoint topic from getting too large, we enabled log compaction on the Kafka topic, but we discovered that the topic then grew to be very large. This behavior was triggered because we were sending compressed messages to the Kafka checkpoint topic.

      Based on KAFKA-1374, it appears that we can't use compressed checkpoint topics with log compaction.

      I'm mostly opening this ticket as a place holder for KAFKA-1374. Once the ticket is resolved, we can update the Samza code to default the checkpoint topics to be log compacted (with a small segment size), and not worry about the compression anymore.

      1. SAMZA-388-0.patch
        9 kB
        Chris Riccomini
      2. SAMZA-388-1.patch
        9 kB
        Chris Riccomini

        Issue Links

          Activity

          Hide
          criccomini Chris Riccomini added a comment -

          Attaching patch. RB at:

          https://reviews.apache.org/r/25039/

          1. Forced checkpoint manager's producer to disable compression.
          2. Forced checkpoint manager to turn on log compaction when creating a checkpoint topic.
          3. Wrote a test to validate that the topic is created with compaction.

          Defaulted to a 25 meg segment for log compacted checkpoint topics, which should be enough for checkpoint topics with many messages. We shouldn't have to worry too much about file handles on the broker since there should be only a couple of segments per checkpoint topic.

          Show
          criccomini Chris Riccomini added a comment - Attaching patch. RB at: https://reviews.apache.org/r/25039/ Forced checkpoint manager's producer to disable compression. Forced checkpoint manager to turn on log compaction when creating a checkpoint topic. Wrote a test to validate that the topic is created with compaction. Defaulted to a 25 meg segment for log compacted checkpoint topics, which should be enough for checkpoint topics with many messages. We shouldn't have to worry too much about file handles on the broker since there should be only a couple of segments per checkpoint topic.
          Hide
          closeuris Yan Fang added a comment -

          +1 for the temporal solution. Few nits posted in RB. feel free to commit. Maybe we want to revisit this ticket after KAFKA-1374, or use another ticket to track in case we forget to update. Thank you.

          Show
          closeuris Yan Fang added a comment - +1 for the temporal solution. Few nits posted in RB. feel free to commit. Maybe we want to revisit this ticket after KAFKA-1374 , or use another ticket to track in case we forget to update. Thank you.
          Hide
          criccomini Chris Riccomini added a comment -

          Attaching updated patch with changes from RB.

          Show
          criccomini Chris Riccomini added a comment - Attaching updated patch with changes from RB.
          Hide
          criccomini Chris Riccomini added a comment -

          Merged and committed. I opened SAMZA-398 as a tracker ticket for KAFKA-1374.

          Show
          criccomini Chris Riccomini added a comment - Merged and committed. I opened SAMZA-398 as a tracker ticket for KAFKA-1374 .

            People

            • Assignee:
              criccomini Chris Riccomini
              Reporter:
              criccomini Chris Riccomini
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development