Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16764

Recommend disabling vectorized parquet reader on OutOfMemoryError

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.1, 2.1.0
    • None
    • None

    Description

      We currently don't bound or manage the data array size used by column vectors in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs while reading data. In the short term, we can probably intercept this exception and suggest the user to disable the vectorized parquet reader.
      Longer term, we should probably do explicit memory management for this.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sameerag Sameer Agarwal
            sameerag Sameer Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment