Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.6.0
-
None
-
None
-
None
Description
Following example section, running query from testFilterPushDownCompositeBigIntRowKey1() results in following execution plan:
EXPLAIN PLAN FOR SELECT CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d ,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id ,CONVERT_FROM(tableName.f.c, 'UTF8') FROM hbase.`TestTableCompositeDate` tableName WHERE CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000 as bigint) ; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))], id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))], EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))]) 00-02 SelectionVectorRemover 00-03 Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8), 'bigint_be'), 1409040000000)]) 00-04 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null], columns=[`*`]]])
From the above, Drill uses full scan and then filters out rows by key substring started from 1st position.
This query executes pretty fast in test dataset provided in repo, but performance dramatically decreases with real use cases.
I've used contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java to populate test table.
Moreover, TestHBaseFilterPushDown uses runHBaseSQLVerifyCount to pass the tests. It checks result set count, and not execution plan.