Description
Datasource allows creation of bucketed and sorted tables but performing joins on such tables still does not utilize this metadata to produce optimal query plan.
As below, the `Exchange` and `Sort` can be avoided if the tables are known to be hashed + sorted on relevant columns.
== Physical Plan == WholeStageCodegen : +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None : :- INPUT : +- INPUT :- WholeStageCodegen : : +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0 : : +- INPUT : +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None : +- WholeStageCodegen : : +- Project [j#20,k#21,i#22] : : +- Filter (isnotnull(k#21) && isnotnull(j#20)) : : +- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, InputPaths: file:/XXXXXXX/table7, PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: struct<j:int,k:string> +- WholeStageCodegen : +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0 : +- INPUT +- Exchange hashpartitioning(j#23, k#24, i#25, 200), None +- WholeStageCodegen : +- Project [j#23,k#24,i#25] : +- Filter (isnotnull(k#24) && isnotnull(j#23)) : +- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, InputPaths: file:/XXXXXXX/table8, PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: struct<j:int,k:string>