Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1600

Invalid query planning for distinct group-by

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Planner/Optimizer
    • Labels:
      None

      Description

      For a query involving distinct operator, group-by is always executed at the last step of the query. Let me consider an example query as follows.

      default> select distinct a.col3 from test as a left outer join lineitem b on a.col1 = b.l_orderkey order by a.col3;
      

      The plan for this query is

      GROUP_BY(5)(col3)
        => target list: default.a.col3 (TEXT)
        => out schema:{(1) default.a.col3 (TEXT)}
        => in schema:{(1) default.a.col3 (TEXT)}
         SORT(3)
           => Sort Keys: default.a.col3 (TEXT) (asc)
            JOIN(7)(LEFT_OUTER)
              => Join Cond: default.a.col1 (INT4) = default.b.l_orderkey (INT4)
              => target list: default.a.col3 (TEXT)
              => out schema: {(1) default.a.col3 (TEXT)}
              => in schema: {(3) default.a.col3 (TEXT), default.a.col1 (INT4), default.b.l_orderkey (INT4)}
               SCAN(1) on default.lineitem_large as b
                 => target list: default.b.l_orderkey (INT4)
                 => out schema: {(1) default.b.l_orderkey (INT4)}
                 => in schema: {(16) default.b.l_orderkey (INT4), default.b.l_partkey (INT4), default.b.l_suppkey (INT4), default.b.l_linenumber (INT4), default.b.l_quantity (FLOAT8), default.b.l_extendedprice (FLOAT8), default.b.l_discount (FLOAT8), default.b.l_tax (FLOAT8), default.b.l_returnflag (TEXT), default.b.l_linestatus (TEXT), default.b.l_shipdate (TEXT), default.b.l_commitdate (TEXT), default.b.l_receiptdate (TEXT), default.b.l_shipinstruct (TEXT), default.b.l_shipmode (TEXT), default.b.l_comment (TEXT)}
               PARTITIONS_SCAN(8) on default.testbroadcastmulticolumnpartitiontable as a
                 => target list: default.a.col3 (TEXT), default.a.col1 (INT4)
                 => num of filtered paths: 3
                 => out schema: {(2) default.a.col3 (TEXT), default.a.col1 (INT4)}
                 => in schema: {(2) default.a.col1 (INT4), default.a.col2 (FLOAT4)}
                 => 0: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=01/col4=1996
                 => 1: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=10/col4=1993
                 => 2: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=12/col4=1996
      

        Attachments

          Activity

            People

            • Assignee:
              hyunsik Hyunsik Choi
              Reporter:
              jihoonson Jihoon Son
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: