Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4675

Mixed or uppercase columns are not resolved in parquet when using PARQUET_FALLBACK_SCHEMA_RESOLUTION=NAME

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
    1. impala-4675.patch
      0.7 kB
      Nathan Salmon

      Issue Links

        Activity

        Hide
        alex.behm Alexander Behm added a comment -

        Nathan Salmon, are you actively working on this or may I re-assign (we want to fix this soon).

        Show
        alex.behm Alexander Behm added a comment - Nathan Salmon , are you actively working on this or may I re-assign (we want to fix this soon).
        Hide
        ngsalmon Nathan Salmon added a comment -

        Whoops, fell off my radar. Thanks for reminding me. I'll issue a patch this weekend.

        Show
        ngsalmon Nathan Salmon added a comment - Whoops, fell off my radar. Thanks for reminding me. I'll issue a patch this weekend.
        Hide
        alex.behm Alexander Behm added a comment -

        Looking forward to it!

        Show
        alex.behm Alexander Behm added a comment - Looking forward to it!
        Hide
        lv Lars Volker added a comment -

        Nathan Salmon - We would like to suggest the name-based resolution as a workaround for IMPALA-4725. Do you have an ETA for your patch? Thanks for working on this!

        Show
        lv Lars Volker added a comment - Nathan Salmon - We would like to suggest the name-based resolution as a workaround for IMPALA-4725 . Do you have an ETA for your patch? Thanks for working on this!
        Hide
        lv Lars Volker added a comment -

        I'm making this a P1, inheriting the priority of IMPALA-4725.

        Show
        lv Lars Volker added a comment - I'm making this a P1, inheriting the priority of IMPALA-4725 .
        Hide
        ngsalmon Nathan Salmon added a comment -

        @lv, I can push up a patch this evening for review while I keep marching on testing. It's pretty straight-forward. I'd run into a few issues around testing in my linux environment, but hope to get them ironed out this weekend. If I hit a road-block, I might reach out with questions.

        Show
        ngsalmon Nathan Salmon added a comment - @lv, I can push up a patch this evening for review while I keep marching on testing. It's pretty straight-forward. I'd run into a few issues around testing in my linux environment, but hope to get them ironed out this weekend. If I hit a road-block, I might reach out with questions.
        Hide
        lv Lars Volker added a comment -

        Nathan Salmon - Thanks for the update.

        Show
        lv Lars Volker added a comment - Nathan Salmon - Thanks for the update.
        Hide
        ngsalmon Nathan Salmon added a comment -

        Take a look at https://gerrit.cloudera.org/#/c/5891/
        All current backend tests passing. The test data for the IMPALA-2835 commit still needs updated with mixed and upper case. I should get to that Saturday.

        Show
        ngsalmon Nathan Salmon added a comment - Take a look at https://gerrit.cloudera.org/#/c/5891/ All current backend tests passing. The test data for the IMPALA-2835 commit still needs updated with mixed and upper case. I should get to that Saturday.
        Hide
        alex.behm Alexander Behm added a comment -

        commit 34353218cebbcd02630b07681624c1a1d5c9fd5b
        Author: Nathan Salmon <nathan.gsalmon@gmail.com>
        Date: Wed Mar 1 13:23:41 2017 -0800

        IMPALA-4675: Case-insensitive matching of Parquet fields.

        The query option PARQUET_FALLBACK_SCHEMA_RESOLUTION
        allows matching of Parquet fields by name instead of by
        index (the default).

        Parquet column names are case sensitive, but Impala treats
        db/table/column/field names as case-insensitive. Today,
        there is no way today to select Parquet columns with mixed
        casing via SQL using the name-based field resolution policy.

        This patch changes the matching of Parquet fields to be
        case-insensitive.

        Testing:

        • Modified the data files backing complextypestbl
          to contain fields with mixed casing.
        • Several existing tests run against this table,
          including the test for name-based resolution.
        • I confirmed that without this fix, the existing
          name-based resolution tests fail on the modified
          data files.
        • I locally ran test_scanners.py and test_nested_types.py
          on exhaustive with this fix.

        Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4
        Reviewed-on: http://gerrit.cloudera.org:8080/5891
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        alex.behm Alexander Behm added a comment - commit 34353218cebbcd02630b07681624c1a1d5c9fd5b Author: Nathan Salmon <nathan.gsalmon@gmail.com> Date: Wed Mar 1 13:23:41 2017 -0800 IMPALA-4675 : Case-insensitive matching of Parquet fields. The query option PARQUET_FALLBACK_SCHEMA_RESOLUTION allows matching of Parquet fields by name instead of by index (the default). Parquet column names are case sensitive, but Impala treats db/table/column/field names as case-insensitive. Today, there is no way today to select Parquet columns with mixed casing via SQL using the name-based field resolution policy. This patch changes the matching of Parquet fields to be case-insensitive. Testing: Modified the data files backing complextypestbl to contain fields with mixed casing. Several existing tests run against this table, including the test for name-based resolution. I confirmed that without this fix, the existing name-based resolution tests fail on the modified data files. I locally ran test_scanners.py and test_nested_types.py on exhaustive with this fix. Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4 Reviewed-on: http://gerrit.cloudera.org:8080/5891 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            ngsalmon Nathan Salmon
            Reporter:
            ngsalmon Nathan Salmon
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development