Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-2392

Avoid a costly sort for highly reducing TMUDFs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0-incubating
    • 2.1-incubating
    • sql-cmp
    • None
    • Any

    Description

      When an input table with a PARTITION BY is specified in a TMUDF, the Trafodion optimizer ensures that the input rows are sorted on (a permutation of) the PARTITION BY columns, so that each parallel TMUDF instance sees the input rows of such a logical partition in contiguous rows. This way the TMUDF can process each group separately.

      This is usually a good way to process the data, except when we are dealing with a large input table and a TMUDF that highly reduces the input data. In that case it may be better to maintain a hash table of groups in the TMUDF and to avoid the costly sort of the input table.

      My proposal is to add a new function type to UDRInvocationInfo.FunctionType, called REDUCER_NC (for Non-Contiguous). Setting the function type to this new type would indicate to the optimizer not to request a sort order on the partitioning columns.

      The table below shows how the function type and PARTITION BY and ORDER BY clauses would determine the effective sort order produced by the optimizer:

      Function type PARTITION BY ORDER BY Data is sorted by
      REDUCER (existing) a,b c,d a,b,c,d
      REDUCER (existing) a,b <empty> a,b
      REDUCER_NC (proposed) a,b c,d c,d
      REDUCER_NC (proposed) a,b <empty> <no sort>

      In all other aspects, REDUCER and REDUCER_NC function types would behave the same.

      Attachments

        Activity

          People

            hzeller Hans Zeller
            hzeller Hans Zeller
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: