Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2415

Reuse hadoop file status and footer in ParquetRecordReader

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.14.0
    • None
    • parquet-hadoop
    • None

    Description

      DESCRIPTION

      Spark will send a listStatus RPC to get hadoop file status and read the parquet file footer before reading the parquet file. And send a same listStatus RPC to get the same hadoop file status and read the footer again in ParquetRecordReader. We can reuse the file status and the footer.

      PLANS

      Save the hadoop file status in the ParquetMetadata and save the ParquetMetadata in the input split, so we can reuse them when init a new ParquetRecordReader.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wankun Wan Kun
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: