Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-2778

Make AzureBlobOutputStream buffer initialization size configurable.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The existing AzureBlobOutputStream uses a ByteArrayOutputStream to buffer messages until flush() and new buffers are initialized to 10MB (Azure's maximum block size). This can cause issues with the G1 garbage collector (default in Java 11) since these would be considered humongous objects. The G1 GC divides the heap into regions and considers any object larger than half of a region size to be humongous. These objects are immediately promoted to perm gen and allocated an entire region. Being allocated to an entire region prevents the GC from allocating memory to unused portions of that region. If the object is larger than a region, multiple contiguous regions are allocated. If there are large number of buffers the JVM can experience OOMs if no regions are empty when a new ByteArrayOutputStream is created. The JVM terminates because new requires immediate memory allocation and cannot not wait for GC.

      GC effectiveness can be improved if the ByteArrayOutputStream is allowed to grow as messages are added and delay or even avoid being considered humongous. These buffers can still become humongous objects, but only once the buffer grows to sufficient size. Clients can customize the initialization size to accommodate their systems.

      References

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              atoomula Aditya Toomula
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h