Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1768

InternalParquetRecordWriter doesn't immediately limit current row group to threshold

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • parquet-mr
    • None

    Description

      The MemoryManager adjust the row group size threshold of writers when the allocated memory pool fills up.
      Problem: However InternalParquetRecordWriter only re-adjusts the row group size on the next flush meaning they still use the old size.
      This opens up a possibility of getting an OOM error if all writers are started at relatively the same time and progress in tandem(I saw this when investigating failing jobs while writing to disk in Spark)

      Attachments

        Activity

          People

            Unassigned Unassigned
            brimzi Brian Mwambazi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: