Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9633

Implement ds_hll_union() builtin function

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.0.0
    • Backend, Frontend
    • None

    Description

      ds_hll_union() is an aggregating function that accepts sketches and produces a single scratch that is the combination of the received scratches.

      Example from Hive:

      create temporary table sketch_intermediate (category char(1), sketch binary);
      insert into sketch_intermediate select category, ds_hll_sketch(id) from sketch_input group by category;
      select ds_hll_estimate(ds_hll_union(sketch)) from sketch_intermediate;
      

      Some test data for the example:

      create temporary table sketch_input (id int, category char(1));
      insert into table sketch_input values
        (1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 'a'), (8, 'a'), (9, 'a'), (10, 'a'),
        (6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 'b'), (13, 'b'), (14, 'b'), (15, 'b');
      

      Approximate result:

      15.000000521540663
      

      Hive change that introduced the same: https://issues.apache.org/jira/browse/HIVE-22940

      Attachments

        Activity

          People

            gaborkaszab Gabor Kaszab
            gaborkaszab Gabor Kaszab
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: