Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50235

Clean up ColumnVector resource after processing all rows in ColumnarToRowExec

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0, 3.4.4, 3.5.3
    • 4.0.0, 3.5.4
    • SQL

    Description

      Currently we only assign null to ColumnarBatch object but it doesn't release the resources hold by the vectors in the batch. For OnHeapColumnVector, the Java arrays may be automatically collected by JVM, but for OffHeapColumnVector, the allocated off-heap memory will be leaked.

      For custom ColumnVector implementations like Arrow-based, it also possibly causes issues on memory safety if the underlying buffers are reused across batches. Because when ColumnarToRowExec begins to fill values for next batch, the arrays in previous batch are still hold.

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              viirya L. C. Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: