Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3687

Unable to rename string column in schema file in AVRO

    Details

      Description

      Reproduction:

      1. Create an avro table with string column and with schema defined in tblproperties or serdeproperties.
      2. Change the column name in the schema file.
      3. invalidate metadata or restart impala
      4. select that column in impala.

      ERROR: AnalysisException: Could not resolve column/field reference: 'xx'
      

        Activity

        Hide
        HuaisiXu Huaisi Xu added a comment -

        https://github.com/cloudera/Impala/commit/1c59ff01860df0d0dced15f54a6835611b6c09ad

        IMPALA-3687: Prefer Avro field name during schema reconciliation
        Since it is possible to create an Avro table with both column
        definitions and an Avro schema, Impala attempts to reconcile
        inconsistencies in the two schema definitions, generally preferring the
        Avro schema. The only exception to this rule was with
        CHAR/VARCHAR/STRING columns, where the column definition was preferred
        in order to support tables with CHAR/VARCHAR columns although Avro only
        supports STRING. This exception is confusing because the name for such a
        column will be taken from the column definition (and not from the Avro
        schema).

        This patch prefers name, comment from Avro schema definition and
        uses column type from column definition for CHAR/VARCHAR/STRING
        columns.

        Change-Id: Ia3e43b2885853c2b4f207a45a873c9d7f31379cd
        Reviewed-on: http://gerrit.cloudera.org:8080/3331
        Reviewed-by: Huaisi Xu <hxu@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        HuaisiXu Huaisi Xu added a comment - https://github.com/cloudera/Impala/commit/1c59ff01860df0d0dced15f54a6835611b6c09ad IMPALA-3687 : Prefer Avro field name during schema reconciliation Since it is possible to create an Avro table with both column definitions and an Avro schema, Impala attempts to reconcile inconsistencies in the two schema definitions, generally preferring the Avro schema. The only exception to this rule was with CHAR/VARCHAR/STRING columns, where the column definition was preferred in order to support tables with CHAR/VARCHAR columns although Avro only supports STRING. This exception is confusing because the name for such a column will be taken from the column definition (and not from the Avro schema). This patch prefers name, comment from Avro schema definition and uses column type from column definition for CHAR/VARCHAR/STRING columns. Change-Id: Ia3e43b2885853c2b4f207a45a873c9d7f31379cd Reviewed-on: http://gerrit.cloudera.org:8080/3331 Reviewed-by: Huaisi Xu <hxu@cloudera.com> Tested-by: Internal Jenkins
        Hide
        HuaisiXu Huaisi Xu added a comment -

        Not sure if this is a regression, but I think the commit is https://github.com/cloudera/Impala/commit/af46f26e130f4bb73214d1b40d5779c661535fec.

        Show
        HuaisiXu Huaisi Xu added a comment - Not sure if this is a regression, but I think the commit is https://github.com/cloudera/Impala/commit/af46f26e130f4bb73214d1b40d5779c661535fec .
        Hide
        dhecht Dan Hecht added a comment -

        Was this a regression? If so, can you post a pointer to the commit that caused this?

        Show
        dhecht Dan Hecht added a comment - Was this a regression? If so, can you post a pointer to the commit that caused this?

          People

          • Assignee:
            HuaisiXu Huaisi Xu
            Reporter:
            HuaisiXu Huaisi Xu
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development