Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-17424

Memory optimisation for Kafka-connect

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.8.0
    • None
    • connect
    • None
    • Patch, Important

    Description

      When Kafka connect gives sink task it's own copy of List<SinkRecords> that RAM utilisation shoots up and at that particular moment the there will be two lists and the original list gets cleared after the sink worker finishes the current batch.

       

      Originally the list is declared final and it's copy is provided to sink task as those can be custom and we let user process it however they want without any risk. But one of the most popular uses of kafka connect is OLTP - OLAP replication, and during initial copying/snapshots a lot of data is generated rapidly which fills the list to it's max batch size length, and we are prone to "Out of Memory" exceptions. And the only use of the list is to get filled > cloned for sink > get size  > cleared > repeat. So I have taken the size of list before giving the original list to sink task and after sink has performed it's operations , set list = new ArrayList<>(). I did not use clear for just in case sink task has set our list to null.

      There is a time vs memory trade-off, 
      In the original approach the jvm does not have spend time to find free memory 

      In new approach the jvm will have to create new list by finding free memory addresses but this results in more free memory.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ajit97 Ajit Singh
              Ewen Cheslack-Postava Ewen Cheslack-Postava
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified