Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-787

Add a size limit for heap allocations when reading

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.9.0
    • Fix Version/s: 1.10.0
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      G1GC allocates humongous objects directly in the old generation to avoid unnecessary copies, which means that these allocations aren't garbage collected until a full GC runs. Humongous objects are objects that are 50% of the region size or more. Region size is at most 32MB (see the table for region size from heap size).

      Parquet currently allocates a huge buffer for each contiguous group of column chunks, which in many cases is not garbage collected until a full GC. Adding a size limit for the allocation size should allow users to break row groups across multiple buffers so that buffers get collected when they have been read.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              rdblue Ryan Blue
              Reporter:
              rdblue Ryan Blue

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment