Description
Whole stage codegen is used by some modern MPP databases to archive great performance. See http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
For Spark SQL, we can compile multiple operator into a single Java function to avoid the overhead from materialize rows and Scala iterator.
Attachments
Attachments
Issues in epic
|
SPARK-12796 | initial prototype: projection/filter/range | Resolved | Davies Liu | ||
|
SPARK-12797 | Aggregation without grouping keys | Resolved | Davies Liu | ||
|
SPARK-12798 | Broadcast hash join | Resolved | Davies Liu | ||
|
SPARK-12902 | Visualization and metrics for generated operators | Resolved | Davies Liu | ||
|
SPARK-12913 | Reimplement stat functions as declarative function | Resolved | Davies Liu | ||
|
SPARK-12914 | Generate TungstenAggregate with grouping keys | Resolved | Davies Liu | ||
|
SPARK-12915 | SQL metrics for generated operators | Resolved | Davies Liu | ||
|
SPARK-12949 | Support common expression elimination | Resolved | L. C. Hsieh | ||
|
SPARK-12950 | Improve performance of BytesToBytesMap | Resolved | Davies Liu | ||
|
SPARK-12951 | Support spilling in generate aggregate | Resolved | Davies Liu | ||
|
SPARK-13031 | Improve test coverage for whole stage codegen | Resolved | Davies Liu | ||
|
SPARK-13095 | improve performance of hash join with dimension table | Resolved | Davies Liu | ||
|
SPARK-13123 | Add wholestage codegen for sort | Resolved | Sameer Agarwal | ||
|
SPARK-13130 | Make whole stage codegen variable names slightly easier to read | Resolved | Reynold Xin | ||
|
SPARK-13135 | Don't print expressions recursively in generated code | Resolved | Dongjoon Hyun | ||
|
SPARK-13147 | improve readability of generated code | Resolved | Davies Liu | ||
|
SPARK-13237 | Generate broadcast outer join | Resolved | Davies Liu | ||
|
SPARK-13293 | Generate code for Expand | Resolved | Davies Liu | ||
|
SPARK-13304 | Broadcast join with two ints could be very slow | Resolved | Davies Liu | ||
|
SPARK-13373 | Generate code for sort merge join | Resolved | Davies Liu | ||
|
SPARK-13404 | Create the variables for input when it's used | Resolved | Davies Liu | ||
|
SPARK-13873 | Avoid the copy in whole stage codegen when there is no joins | Resolved | Davies Liu | ||
|
SPARK-13917 | Generate code for broadcast left semi join | Resolved | Davies Liu | ||
|
SPARK-13950 | Generate code for sort merge left/right outer join | Closed | Davies Liu | ||
|
SPARK-14710 | Rename gen/genCode to genCode/doGenCode to better reflect the semantics | Resolved | Sameer Agarwal | ||
|
SPARK-14718 | Avoid mutating ExprCode in doGenCode | Resolved | Sameer Agarwal | ||
|
SPARK-14722 | Rename upstreams() -> inputRDDs() in WholeStageCodegen | Resolved | Sameer Agarwal | ||
|
SPARK-14748 | BoundReference should not set ExprCode.code to empty string | Closed | Unassigned | ||
|
SPARK-16844 | Generate code for sort based aggregation | Resolved | Unassigned |
SPARK-12795
Whole stage codegen
false
SPARK-12795
Whole stage codegen