Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1428

Inefficient execution plan of SELECT and LIMIT for Druid

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 1.11.0
    • Component/s: core, druid
    • Labels:
      None

      Description

      For SQLs like:

      1. SELECT * FROM <table> LIMIT <row_count>
      2. SELECT <all_columns_specified_explicitly> FROM <table> LIMIT <row_count>

      DruidSortRule in Druid adapter does take effect and LIMIT is pushed into DruidQuery. However the corresponding execution plan isn't chosen as the best one. Thus Calcite will retrieve all data from Druid and purge all unnecessary columns.

      These are three SQLs and their corresponding execution plans below for dataset wikiticker in Druid quickstart:

      1. SELECT "cityName" FROM "wikiticker" LIMIT 5

      rel#27:EnumerableInterpreter.ENUMERABLE.[](input=rel#26:Subset#2.BINDABLE.[])
      
      rel#85:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z],projects=[$3],fetch=5)
      

      2. SELECT * FROM "wikiticker" LIMIT 5

      rel#52:EnumerableLimit.ENUMERABLE.[](input=rel#36:Subset#0.ENUMERABLE.[],fetch=5)
      
      rel#79:EnumerableInterpreter.ENUMERABLE.[](input=rel#4:Subset#0.BINDABLE.[])
      
      rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])
      

      3. SELECT "__time", "added", "channel", "cityName", "comment", "commentLength", "count", "countryIsoCode", "countryName", "deleted", "delta", "deltaBucket", "diffUrl", "flags", "isAnonymous", "isMinor", "isNew", "isRobot", "isUnpatrolled", "metroCode", "namespace", "page", "regionIsoCode", "regionName", "user", "user_unique" FROM "wikiticker" LIMIT 5

      rel#42:EnumerableLimit.ENUMERABLE.[](input=rel#41:Subset#1.ENUMERABLE.[],fetch=5)
      
      rel#113:EnumerableInterpreter.ENUMERABLE.[](input=rel#34:Subset#1.BINDABLE.[])
      
      rel#52:BindableProject.BINDABLE.[](input=rel#4:Subset#0.BINDABLE.[],__time=$0,added=$1,channel=$2,cityName=$3,comment=$4,commentLength=$5,count=$6,countryIsoCode=$7,countryName=$8,deleted=$9,delta=$10,deltaBucket=$11,diffUrl=$12,flags=$13,isAnonymous=$14,isMinor=$15,isNew=$16,isRobot=$17,isUnpatrolled=$18,metroCode=$19,namespace=$20,page=$21,regionIsoCode=$22,regionName=$23,user=USER,user_unique=$25)
      
      rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])
      

      Notice that 2 and 3 should have LIMIT pushed to DruidQuery like 1 (and should not have EnumerableLimit)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                julianhyde Julian Hyde
                Reporter:
                VcamX Jiarong Wei
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: