Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42388

Avoid unnecessary parquet footer reads when no filters in vectorized reader

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.5.0
    • SQL
    • None

    Description

      Parquet footer is now read twice even if there are no filters requiring pushdown in vectorized parquet reader.
      When the NameNode is under high pressure, it will cost time to read twice. Actually we can avoid this unnecessary parquet footer reads and use footer metadata inĀ VectorizedParquetRecordReader.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            miracle Mars
            miracle Mars
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment