Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15055

Column pruning for nested fields in Parquet

    XMLWordPrintableJSON

Details

    Description

      Some columnar file formats such as Parquet store fields in struct type also column by column using encoding described in Google Dramel pager. It's very common in big data where data are stored in structs while queries only needs a subset of the the fields in the structs. However, presently Hive still needs to read the whole struct regardless whether all fields are selected. Therefore, pruning unwanted sub-fields in struct or nested fields at file reading time would be a big performance boost for such scenarios.

      Attachments

        1. design-doc-nested-column-pruning.pdf
          109 kB
          Chao Sun
        2. benchmark-hos.pdf
          215 kB
          Chao Sun

        Activity

          People

            csun Chao Sun
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: