Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4211

Column aliases not pushed down to JDBC stores in some cases when Drill expects aliased columns to be returned.

    XMLWordPrintableJSON

Details

    Description

      When making an sql statement that incorporates a join to a table and then a self join to that table to get a parent value , Drill brings back inconsistent results.

      Here is the sql in postgres with correct output:

      select trx.categoryguid,
      cat.categoryname, w1.categoryname as parentcat
      from transactions trx
      join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
      join categories w1 on (cat.categoryparentguid = w1.categoryguid)
      where cat.categoryparentguid IS NOT NULL;
      

      Output:

      categoryid categoryname parentcategory
      id1 restaurants food&Dining
      id1 restaurants food&Dining
      id2 Coffee Shops food&Dining
      id2 Coffee Shops food&Dining

      When run in Drill with correct storage prefix:

      select trx.categoryguid,
      cat.categoryname, w1.categoryname as parentcat
      from db.schema.transactions trx
      join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
      join db.schema.wpfm_categories w1 on (cat.categoryparentguid = w1.categoryguid)
      where cat.categoryparentguid IS NOT NULL
      

      Results are:

      categoryid categoryname parentcategory
      id1 restaurants null
      id1 restaurants null
      id2 Coffee Shops null
      id2 Coffee Shops null

      Physical plan is:

      00-00    Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
      00-01      Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
      00-02        Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
      00-03          Jdbc(sql=[SELECT *
      FROM "public"."transactions"
      INNER JOIN (SELECT *
      FROM "public"."categories"
      WHERE "categoryparentguid" IS NOT NULL) AS "t" ON "transactions"."categoryguid" = "t"."categoryguid"
      INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, VARCHAR(50) transactionorigpartyguid, VARCHAR(255) transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, DECIMAL(3, 0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) categorydescription0, VARCHAR(50) partyguid0): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64259
      

      I worked around it by creating a view on postgres but not ideal. Thanks in advance.
      First Drill Jira Bug.

      Attachments

        Issue Links

          Activity

            People

              volodymyr Vova Vysotskyi
              robert.hamiltonsmith@gmail.com Robert Hamilton-Smith
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: