Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0-incubating
-
None
-
Any
Description
When an input table with a PARTITION BY is specified in a TMUDF, the Trafodion optimizer ensures that the input rows are sorted on (a permutation of) the PARTITION BY columns, so that each parallel TMUDF instance sees the input rows of such a logical partition in contiguous rows. This way the TMUDF can process each group separately.
This is usually a good way to process the data, except when we are dealing with a large input table and a TMUDF that highly reduces the input data. In that case it may be better to maintain a hash table of groups in the TMUDF and to avoid the costly sort of the input table.
My proposal is to add a new function type to UDRInvocationInfo.FunctionType, called REDUCER_NC (for Non-Contiguous). Setting the function type to this new type would indicate to the optimizer not to request a sort order on the partitioning columns.
The table below shows how the function type and PARTITION BY and ORDER BY clauses would determine the effective sort order produced by the optimizer:
Function type | PARTITION BY | ORDER BY | Data is sorted by |
---|---|---|---|
REDUCER (existing) | a,b | c,d | a,b,c,d |
REDUCER (existing) | a,b | <empty> | a,b |
REDUCER_NC (proposed) | a,b | c,d | c,d |
REDUCER_NC (proposed) | a,b | <empty> | <no sort> |
In all other aspects, REDUCER and REDUCER_NC function types would behave the same.