Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6307

A CTAS query fails with error: AnalysisException: Duplicate column name: <columnName>

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.10.0, Impala 2.11.0
    • Impala 2.12.0
    • Frontend
    • ghx-label-6

    Description

      The following query triggers the exception:

      CREATE TABLE foo partitioned by (year) AS

      WITH TMP AS (
      SELECT a.timestamp_col, a.year FROM functional.alltypes a
      LEFT JOIN functional.alltypes b
      ON b.timestamp_col BETWEEN a.timestamp_col AND a.timestamp_col
      )

      SELECT a.timestamp_col, a.year FROM TMP a;

      The exception is thrown from TableDef::analyzeColumnDefs():

      private void analyzeColumnDefs(Analyzer analyzer) throws AnalysisException {
          Set<String> colNames = Sets.newHashSet();
          for (ColumnDef colDef: columnDefs_) {
            colDef.analyze(analyzer);
            if (!colNames.add(colDef.getColName().toLowerCase())) {
              throw new AnalysisException("Duplicate column name: " + colDef.getColName());
            }
            if (!isKuduTable() && colDef.hasKuduOptions()) {
              throw new AnalysisException(String.format("Unsupported column options for " +
                  "file format '%s': '%s'", getFileFormat().name(), colDef.toString()));
            }
          }
          for (ColumnDef colDef: getPartitionColumnDefs()) {
            colDef.analyze(analyzer);
            if (!colDef.getType().supportsTablePartitioning()) {
              throw new AnalysisException(
                  String.format("Type '%s' is not supported as partition-column type " +
                      "in column: %s", colDef.getType().toSql(), colDef.getColName()));
            }
            if (!colNames.add(colDef.getColName().toLowerCase())) {
              throw new AnalysisException("Duplicate column name: " + colDef.getColName()); // THROWS HERE
            }
          }
      
      

      The column duplication happens for "year" because it's in both columnDefs_ and dataLayout_::partitionColDefs_.

      The issue does not reproduce is we replace BETWEEN in the JOIN clause with the equivalent "b.timestamp_col > a.timestamp_col AND b.timestamp_col < a.timestamp_col".

      Attachments

        Issue Links

          Activity

            People

              zoram Zoram Thanga
              zoram Zoram Thanga
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: