Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42388

Avoid unnecessary parquet footer reads when no filters in vectorized reader

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.5.0
    • SQL

    Description

      Parquet footer is now read twice even if there are no filters requiring pushdown in vectorized parquet reader.
      When the NameNode is under high pressure, it will cost time to read twice. Actually we can avoid this unnecessary parquet footer reads and use footer metadata inĀ VectorizedParquetRecordReader.

      Attachments

        Issue Links

          Activity

            People

              miracle Mars
              miracle Mars
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: