Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4366

Aggregation Improvement

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • None
    • None
    • SQL

    Description

      This improvement actually includes couple of sub tasks.

      Attachments

        1. aggregatefunction_v1.pdf
          523 kB
          Cheng Hao

        Issue Links

          1.
          Simplify the Aggregation Function implementation Sub-task Resolved Cheng Hao
          2.
          Partial aggregation support the DISTINCT aggregation Sub-task Resolved Yin Huai
          3.
          Sort-based Aggregation Sub-task Resolved Yin Huai
          4.
          Support Scala/Java UDAF Sub-task Resolved Yin Huai
          5.
          HiveUDAF support for AggregateFunction2 Sub-task Resolved Wenchen Fan
          6.
          Hybrid aggregate operator using unsafe row Sub-task Resolved Yin Huai
          7.
          Supporting multiple DISTINCT columns Sub-task Resolved Herman van Hövell
          8.
          Audit both built-in aggregate function and UDAF interface before 1.5.0 release Technical task Resolved Reynold Xin
          9.
          Fix the false negative of Aggregate2Sort and FinalAndCompleteAggregate2Sort's missingInput Sub-task Resolved Yin Huai
          10.
          cleanup comments, code style, naming typo for the new aggregation Sub-task Resolved Wenchen Fan
          11.
          stddev_pop and stddev_samp aggregate functions Sub-task Resolved Jihong Ma
          12.
          variance, var_pop, and var_samp aggregate functions Sub-task Resolved Seth Hendrickson
          13.
          covar_pop and covar_samp aggregate functions Sub-task Resolved L. C. Hsieh
          14.
          corr aggregate functions Sub-task Resolved L. C. Hsieh
          15.
          percentile and percentile_approx aggregate functions Sub-task Resolved Unassigned
          16.
          histogram_numeric aggregate function Sub-task Resolved Unassigned
          17.
          collect_set and collect_list aggregate functions Sub-task Resolved Nick Buroojy
          18.
          UDAF cleanup for 1.5 Sub-task Resolved Yin Huai
          19.
          Remove the placeholder attributes used in the aggregation buffers Sub-task Resolved Yin Huai
          20.
          Refactor new aggregation code to reduce the times of checking compatibility Sub-task Resolved L. C. Hsieh
          21.
          Cleanup Hybrid Aggregate Operator. Sub-task Resolved Yin Huai
          22.
          Use sqlContext.udf to register UDAFs. Sub-task Resolved Yin Huai
          23.
          first/last aggregate NULL behavior Sub-task Resolved Yin Huai
          24.
          approx count distinct function Sub-task Resolved Herman van Hövell
          25.
          TungstenAggregate should also accept InternalRow instead of just UnsafeRow Sub-task Resolved Yin Huai
          26.
          Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s Sub-task Resolved Yin Huai
          27.
          The simpleString of TungstenAggregate does not show its output Sub-task Resolved Yin Huai
          28.
          Eliminate hash table lookup if there is no grouping key in aggregation. Sub-task Resolved Reynold Xin
          29.
          We need to explicitly use transformDown when rewrite aggregation results Sub-task Resolved Josh Rosen
          30.
          MutableProjection should evaluate all expressions first and then update the mutable row Sub-task Resolved Davies Liu
          31.
          use new aggregate interface for hive UDAF Sub-task Resolved Wenchen Fan
          32.
          Partial Aggregation Support for Hive UDAF Sub-task Resolved Cheng Hao
          33.
          Better group distinct columns in query compilation Sub-task Resolved Unassigned
          34.
          .Refactor AggregateFunction2 and AlgebraicAggregate interfaces to improve code clarity Sub-task Resolved Josh Rosen
          35.
          Reduce duplication in Aggregate2's expression rewriting logic Sub-task Resolved Josh Rosen
          36.
          Support ImperativeAggregates in TungstenAggregate Sub-task Resolved Josh Rosen
          37.
          When planning queries without partial aggregation support, we should try to use TungstenAggregate. Sub-task Resolved Unassigned
          38.
          Remove use of KVIterator in SortBasedAggregationIterator Sub-task Resolved Josh Rosen
          39.
          Support single distinct count on multiple columns Sub-task Resolved Herman van Hövell
          40.
          variance should alias var_samp instead of var_pop Sub-task Resolved Reynold Xin
          41.
          Spark SQL SELECT COUNT DISTINCT optimization Sub-task Resolved Yin Huai
          42.
          Restore the 1.5's behavior of planning a single distinct aggregation. Sub-task Resolved Yin Huai
          43.
          Spark StdDev/Variance defaults are incompatible with Hive Sub-task Closed Unassigned
          44.
          Improved multi-column counting Sub-task Resolved Herman van Hövell

          Activity

            People

              Unassigned Unassigned
              chenghao Cheng Hao
              Votes:
              2 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: