[CALCITE-1428] Inefficient execution plan of SELECT and LIMIT for Druid - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 1.11.0
Component/s: core, druid-adapter
Labels:
None

Description

For SQLs like:

1. SELECT * FROM <table> LIMIT <row_count>
2. SELECT <all_columns_specified_explicitly> FROM <table> LIMIT <row_count>

DruidSortRule in Druid adapter does take effect and LIMIT is pushed into DruidQuery. However the corresponding execution plan isn't chosen as the best one. Thus Calcite will retrieve all data from Druid and purge all unnecessary columns.

These are three SQLs and their corresponding execution plans below for dataset wikiticker in Druid quickstart:

1. SELECT "cityName" FROM "wikiticker" LIMIT 5

rel#27:EnumerableInterpreter.ENUMERABLE.[](input=rel#26:Subset#2.BINDABLE.[])

rel#85:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z],projects=[$3],fetch=5)

2. SELECT * FROM "wikiticker" LIMIT 5

rel#52:EnumerableLimit.ENUMERABLE.[](input=rel#36:Subset#0.ENUMERABLE.[],fetch=5)

rel#79:EnumerableInterpreter.ENUMERABLE.[](input=rel#4:Subset#0.BINDABLE.[])

rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])

3. SELECT "__time", "added", "channel", "cityName", "comment", "commentLength", "count", "countryIsoCode", "countryName", "deleted", "delta", "deltaBucket", "diffUrl", "flags", "isAnonymous", "isMinor", "isNew", "isRobot", "isUnpatrolled", "metroCode", "namespace", "page", "regionIsoCode", "regionName", "user", "user_unique" FROM "wikiticker" LIMIT 5

rel#42:EnumerableLimit.ENUMERABLE.[](input=rel#41:Subset#1.ENUMERABLE.[],fetch=5)

rel#113:EnumerableInterpreter.ENUMERABLE.[](input=rel#34:Subset#1.BINDABLE.[])

rel#52:BindableProject.BINDABLE.[](input=rel#4:Subset#0.BINDABLE.[],__time=$0,added=$1,channel=$2,cityName=$3,comment=$4,commentLength=$5,count=$6,countryIsoCode=$7,countryName=$8,deleted=$9,delta=$10,deltaBucket=$11,diffUrl=$12,flags=$13,isAnonymous=$14,isMinor=$15,isNew=$16,isRobot=$17,isUnpatrolled=$18,metroCode=$19,namespace=$20,page=$21,regionIsoCode=$22,regionName=$23,user=USER,user_unique=$25)

rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])

Notice that 2 and 3 should have LIMIT pushed to DruidQuery like 1 (and should not have EnumerableLimit)

Attachments

Issue Links

duplicates

CALCITE-1206 Push HAVING, ORDER BY and LIMIT into Druid

Open

Activity

People

Assignee:: Julian Hyde

Reporter:: Jiarong Wei

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Oct/16 10:07

Updated:: 06/Jan/17 19:32

Resolved:: 11/Oct/16 18:27