Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1428

Inefficient execution plan of SELECT and LIMIT for Druid

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 1.11.0
    • core, druid-adapter
    • None

    Description

      For SQLs like:

      1. SELECT * FROM <table> LIMIT <row_count>
      2. SELECT <all_columns_specified_explicitly> FROM <table> LIMIT <row_count>

      DruidSortRule in Druid adapter does take effect and LIMIT is pushed into DruidQuery. However the corresponding execution plan isn't chosen as the best one. Thus Calcite will retrieve all data from Druid and purge all unnecessary columns.

      These are three SQLs and their corresponding execution plans below for dataset wikiticker in Druid quickstart:

      1. SELECT "cityName" FROM "wikiticker" LIMIT 5

      rel#27:EnumerableInterpreter.ENUMERABLE.[](input=rel#26:Subset#2.BINDABLE.[])
      
      rel#85:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z],projects=[$3],fetch=5)
      

      2. SELECT * FROM "wikiticker" LIMIT 5

      rel#52:EnumerableLimit.ENUMERABLE.[](input=rel#36:Subset#0.ENUMERABLE.[],fetch=5)
      
      rel#79:EnumerableInterpreter.ENUMERABLE.[](input=rel#4:Subset#0.BINDABLE.[])
      
      rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])
      

      3. SELECT "__time", "added", "channel", "cityName", "comment", "commentLength", "count", "countryIsoCode", "countryName", "deleted", "delta", "deltaBucket", "diffUrl", "flags", "isAnonymous", "isMinor", "isNew", "isRobot", "isUnpatrolled", "metroCode", "namespace", "page", "regionIsoCode", "regionName", "user", "user_unique" FROM "wikiticker" LIMIT 5

      rel#42:EnumerableLimit.ENUMERABLE.[](input=rel#41:Subset#1.ENUMERABLE.[],fetch=5)
      
      rel#113:EnumerableInterpreter.ENUMERABLE.[](input=rel#34:Subset#1.BINDABLE.[])
      
      rel#52:BindableProject.BINDABLE.[](input=rel#4:Subset#0.BINDABLE.[],__time=$0,added=$1,channel=$2,cityName=$3,comment=$4,commentLength=$5,count=$6,countryIsoCode=$7,countryName=$8,deleted=$9,delta=$10,deltaBucket=$11,diffUrl=$12,flags=$13,isAnonymous=$14,isMinor=$15,isNew=$16,isRobot=$17,isUnpatrolled=$18,metroCode=$19,namespace=$20,page=$21,regionIsoCode=$22,regionName=$23,user=USER,user_unique=$25)
      
      rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])
      

      Notice that 2 and 3 should have LIMIT pushed to DruidQuery like 1 (and should not have EnumerableLimit)

      Attachments

        Issue Links

          Activity

            People

              julianhyde Julian Hyde
              VcamX Jiarong Wei
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: