Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15131

Change Parquet reader to read metadata on the task side

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Reader
    • None

    Description

      Currently the ParquetRecordReaderWrapper still uses the readFooter API without filtering, which means it needs to read metadata about all row groups every time. This could some issues when input dataset is particularly big and has many columns.

      Parquet-84 introduced another API which allows to do row group filtering on the task side. Hive should adopt this API.

      Attachments

        1. HIVE-15131.4.patch
          3 kB
          Adesh Kumar Rao
        2. HIVE-15131.3.patch
          3 kB
          Adesh Kumar Rao
        3. HIVE-15131.2.patch
          2 kB
          Adesh Kumar Rao
        4. HIVE-15131.1.patch
          2 kB
          Adesh Kumar Rao

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            adeshrao Adesh Kumar Rao Assign to me
            csun Chao Sun

            Dates

              Created:
              Updated:

              Slack

                Issue deployment