Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6307

A CTAS query fails with error: AnalysisException: Duplicate column name: <columnName>

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.10.0, Impala 2.11.0
    • Fix Version/s: Impala 2.12.0
    • Component/s: Frontend
    • Labels:
    • Epic Color:
      ghx-label-6

      Description

      The following query triggers the exception:

      CREATE TABLE foo partitioned by (year) AS

      WITH TMP AS (
      SELECT a.timestamp_col, a.year FROM functional.alltypes a
      LEFT JOIN functional.alltypes b
      ON b.timestamp_col BETWEEN a.timestamp_col AND a.timestamp_col
      )

      SELECT a.timestamp_col, a.year FROM TMP a;

      The exception is thrown from TableDef::analyzeColumnDefs():

      private void analyzeColumnDefs(Analyzer analyzer) throws AnalysisException {
          Set<String> colNames = Sets.newHashSet();
          for (ColumnDef colDef: columnDefs_) {
            colDef.analyze(analyzer);
            if (!colNames.add(colDef.getColName().toLowerCase())) {
              throw new AnalysisException("Duplicate column name: " + colDef.getColName());
            }
            if (!isKuduTable() && colDef.hasKuduOptions()) {
              throw new AnalysisException(String.format("Unsupported column options for " +
                  "file format '%s': '%s'", getFileFormat().name(), colDef.toString()));
            }
          }
          for (ColumnDef colDef: getPartitionColumnDefs()) {
            colDef.analyze(analyzer);
            if (!colDef.getType().supportsTablePartitioning()) {
              throw new AnalysisException(
                  String.format("Type '%s' is not supported as partition-column type " +
                      "in column: %s", colDef.getType().toSql(), colDef.getColName()));
            }
            if (!colNames.add(colDef.getColName().toLowerCase())) {
              throw new AnalysisException("Duplicate column name: " + colDef.getColName()); // THROWS HERE
            }
          }
      
      

      The column duplication happens for "year" because it's in both columnDefs_ and dataLayout_::partitionColDefs_.

      The issue does not reproduce is we replace BETWEEN in the JOIN clause with the equivalent "b.timestamp_col > a.timestamp_col AND b.timestamp_col < a.timestamp_col".

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zoram Zoram Thanga
                Reporter:
                zoram Zoram Thanga
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: