Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8120 Umbrella JIRA tracking Parquet improvements
  3. HIVE-9670

Avoid reading file footers in ParquetRecordReaderWrapper

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      ParquetRecordReaderWrapper is reading the file footer to create the splits, but then when calling the realReader.initialize(), the file footer is read again by parquet.

      The issue PARQUET-139 did work to avoid reading the footers in parquet-avro. We should implement the same idea in Hive, and update the parquet library to the latest stable version from upstream.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            spena Sergio Peña Assign to me
            spena Sergio Peña

            Dates

              Created:
              Updated:

              Slack

                Issue deployment