Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4707

Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      On latest master branch:

      select version, commit_id, commit_message from sys.version;
      +-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
      |     version     |                 commit_id                 |                                 commit_message                                  |
      +-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
      | 1.7.0-SNAPSHOT  | 3186217e5abe3c6c2c7e504cdb695567ff577e4c  | DRILL-4607: Add a split function that allows to separate string by a delimiter  |
      +-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
      

      If a query has two conflicting column names under case-insensitive policy, Drill will either hit memory leak, or incorrect issue.

      Q1.

      select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
      Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (131072)
      Allocator(op:0:0:1:Project) 1000000/131072/2490368/10000000000 (res/actual/peak/limit)
      
      
      Fragment 0:0
      

      Q2: return only one column in the result.

      select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
      +------+
      | XYZ  |
      +------+
      | 0    |
      | 1    |
      | 1    |
      | 1    |
      | 4    |
      | 0    |
      | 3    |
      

      The cause of the problem seems to be that the Project thinks the two incoming columns as identical (since Drill adopts case-insensitive for column names in execution).

      The planner should make sure that the conflicting columns are resolved, since execution is name-based.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jni Jinfeng Ni
                Reporter:
                jni Jinfeng Ni
                Reviewer:
                Robert Hou
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: