Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15131

Change Parquet reader to read metadata on the task side

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Reader
    • None

    Description

      Currently the ParquetRecordReaderWrapper still uses the readFooter API without filtering, which means it needs to read metadata about all row groups every time. This could some issues when input dataset is particularly big and has many columns.

      Parquet-84 introduced another API which allows to do row group filtering on the task side. Hive should adopt this API.

      Attachments

        1. HIVE-15131.1.patch
          2 kB
          Adesh Kumar Rao
        2. HIVE-15131.2.patch
          2 kB
          Adesh Kumar Rao
        3. HIVE-15131.3.patch
          3 kB
          Adesh Kumar Rao
        4. HIVE-15131.4.patch
          3 kB
          Adesh Kumar Rao

        Activity

          People

            adeshrao Adesh Kumar Rao
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: