Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4734

HdfsParquetTableWriter should populate sorting_columns in row groups with any ordering columns

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
      None

      Description

      The sortby() query hint introduced in IMPALA-4163 allows to order rows prior to insertion. The HdfsParquetTableWriter should populate sorting_columns in row groups with any ordering columns specified by the sortby() hint.

        Issue Links

          Activity

          Hide
          lv Lars Volker added a comment -

          IMPALA-4734: Set parquet::RowGroup::sorting_columns

          This changes the HdfsParquetTableWriter to populate the
          parquet::RowGroup::sorting_columns list with all columns mentioned in a
          'sortby()' hint within INSERT statements. The columns are added to the
          list in the order in which they appear inside the hint.

          The change also adds backports.tempfile to the python requirements to
          provide 'tempfile.TemporaryDirectory' on python 2.7.

          The change also changes the default ordering for columns mentioned in
          'sortby()' hints from descending to ascending.

          To test this change, we write a table with a 'sortby()' hint and verify,
          that the sorting_columns get populated correctly.

          Change-Id: Ib42aab585e9e627796e9510e783652d49d74b56c
          Reviewed-on: http://gerrit.cloudera.org:8080/6219
          Reviewed-by: Lars Volker <lv@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          lv Lars Volker added a comment - IMPALA-4734 : Set parquet::RowGroup::sorting_columns This changes the HdfsParquetTableWriter to populate the parquet::RowGroup::sorting_columns list with all columns mentioned in a 'sortby()' hint within INSERT statements. The columns are added to the list in the order in which they appear inside the hint. The change also adds backports.tempfile to the python requirements to provide 'tempfile.TemporaryDirectory' on python 2.7. The change also changes the default ordering for columns mentioned in 'sortby()' hints from descending to ascending. To test this change, we write a table with a 'sortby()' hint and verify, that the sorting_columns get populated correctly. Change-Id: Ib42aab585e9e627796e9510e783652d49d74b56c Reviewed-on: http://gerrit.cloudera.org:8080/6219 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              lv Lars Volker
              Reporter:
              lv Lars Volker
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development