Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-2093

Partition Pruning doesn't handle Constant folding occasionally for BETWEEN clause.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.12.0, 0.11.1
    • 0.11.2
    • Planner/Optimizer
    • None

    Description

      In some cases, Partition Pruning doesn't handle Constant folding for {{BETWEEN|| clause as shown below.

      default> CREATE EXTERNAL TABLE student (
        id        INT,
        name      TEXT,
        grade     TEXT,
        phone     TEXT
      ) USING TEXT
      PARTITION BY COLUMN (entrance_date TEXT)
      LOCATION '/data/student';
      
      
      default> \dfs -ls /data/student;
      Found 6 items
      drwx------   - tajo supergroup          0 2016-03-14 11:13 /data/student/entrance_date=20160308
      drwx------   - tajo supergroup          0 2016-03-14 11:13 /data/student/entrance_date=20160309
      drwx------   - tajo supergroup          0 2016-03-14 11:13 /data/student/entrance_date=20160310
      drwx------   - tajo supergroup          0 2016-03-14 11:13 /data/student/entrance_date=20160311
      drwx------   - tajo supergroup          0 2016-03-14 11:13 /data/student/entrance_date=20160312
      drwx------   - tajo supergroup          0 2016-03-14 11:13 /data/student/entrance_date=20160313
      
      default> \! date
      2016. 03. 14. (MON) 15:19:27 KST
      
      default> SELECT * FROM student WHERE entrance_date BETWEEN TO_CHAR(ADD_DAYS(CURRENT_DATE(), -2) , 'YYYYMMDD') AND TO_CHAR( ADD_DAYS(CURRENT_DATE(), -1) , 'YYYYMMDD');
      

      Above select query returns two rows as expected. But when I found logical plan on Tajo web UI, tajo retrieves all sub-directories of the example table as following:

      PARTITIONS_SCAN(4) on default.student
        => filter: default.student.entrance_date (TEXT) BETWEEN ASYMMETRIC to_char(add_days(current_date(),-2),YYYYMMDD) AND to_char(add_days(current_date(),-1),YYYYMMDD)
        => target list: default.student.id (INT4), default.student.name (TEXT), default.student.grade (TEXT), default.student.phone (TEXT), default.student.entrance_date (TEXT)
        => num of filtered paths: 6
        => out schema: {(5) default.student.id (INT4), default.student.name (TEXT), default.student.grade (TEXT), default.student.phone (TEXT), default.student.entrance_date (TEXT)}
        => in schema: {(4) default.student.id (INT4), default.student.name (TEXT), default.student.grade (TEXT), default.student.phone (TEXT)}
        => 0: hdfs://localhost:9010/data/student/entrance_date=20160308
        => 1: hdfs://localhost:9010/data/student/entrance_date=20160309
        => 2: hdfs://localhost:9010/data/student/entrance_date=20160310
        => 3: hdfs://localhost:9010/data/student/entrance_date=20160311
        => 4: hdfs://localhost:9010/data/student/entrance_date=20160312
        => 5: hdfs://localhost:9010/data/student/entrance_date=20160313
      

      Attachments

        Activity

          People

            blrunner JaeHwa Jung
            blrunner JaeHwa Jung
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: