Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-787

Add a size limit for heap allocations when reading

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.9.0
    • 1.10.0
    • parquet-mr
    • None

    Description

      G1GC allocates humongous objects directly in the old generation to avoid unnecessary copies, which means that these allocations aren't garbage collected until a full GC runs. Humongous objects are objects that are 50% of the region size or more. Region size is at most 32MB (see the table for region size from heap size).

      Parquet currently allocates a huge buffer for each contiguous group of column chunks, which in many cases is not garbage collected until a full GC. Adding a size limit for the allocation size should allow users to break row groups across multiple buffers so that buffers get collected when they have been read.

      Attachments

        Issue Links

          Activity

            People

              rdblue Ryan Blue
              rdblue Ryan Blue
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: