Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4366

Aggregation Improvement

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • None
    • None
    • SQL

    Description

      This improvement actually includes couple of sub tasks.

      Attachments

        1.
        Simplify the Aggregation Function implementation Sub-task Resolved Cheng Hao Actions
        2.
        Partial aggregation support the DISTINCT aggregation Sub-task Resolved Yin Huai Actions
        3.
        Sort-based Aggregation Sub-task Resolved Yin Huai Actions
        4.
        Support Scala/Java UDAF Sub-task Resolved Yin Huai Actions
        5.
        HiveUDAF support for AggregateFunction2 Sub-task Resolved Wenchen Fan Actions
        6.
        Hybrid aggregate operator using unsafe row Sub-task Resolved Yin Huai Actions
        7.
        Supporting multiple DISTINCT columns Sub-task Resolved Herman van Hövell Actions
        8.
        Audit both built-in aggregate function and UDAF interface before 1.5.0 release Technical task Resolved Reynold Xin Actions
        9.
        Fix the false negative of Aggregate2Sort and FinalAndCompleteAggregate2Sort's missingInput Sub-task Resolved Yin Huai Actions
        10.
        cleanup comments, code style, naming typo for the new aggregation Sub-task Resolved Wenchen Fan Actions
        11.
        stddev_pop and stddev_samp aggregate functions Sub-task Resolved Jihong Ma Actions
        12.
        variance, var_pop, and var_samp aggregate functions Sub-task Resolved Seth Hendrickson Actions
        13.
        covar_pop and covar_samp aggregate functions Sub-task Resolved L. C. Hsieh Actions
        14.
        corr aggregate functions Sub-task Resolved L. C. Hsieh Actions
        15.
        percentile and percentile_approx aggregate functions Sub-task Resolved Unassigned Actions
        16.
        histogram_numeric aggregate function Sub-task Resolved Unassigned Actions
        17.
        collect_set and collect_list aggregate functions Sub-task Resolved Nick Buroojy Actions
        18.
        UDAF cleanup for 1.5 Sub-task Resolved Yin Huai Actions
        19.
        Remove the placeholder attributes used in the aggregation buffers Sub-task Resolved Yin Huai Actions
        20.
        Refactor new aggregation code to reduce the times of checking compatibility Sub-task Resolved L. C. Hsieh Actions
        21.
        Cleanup Hybrid Aggregate Operator. Sub-task Resolved Yin Huai Actions
        22.
        Use sqlContext.udf to register UDAFs. Sub-task Resolved Yin Huai Actions
        23.
        first/last aggregate NULL behavior Sub-task Resolved Yin Huai Actions
        24.
        approx count distinct function Sub-task Resolved Herman van Hövell Actions
        25.
        TungstenAggregate should also accept InternalRow instead of just UnsafeRow Sub-task Resolved Yin Huai Actions
        26.
        Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s Sub-task Resolved Yin Huai Actions
        27.
        The simpleString of TungstenAggregate does not show its output Sub-task Resolved Yin Huai Actions
        28.
        Eliminate hash table lookup if there is no grouping key in aggregation. Sub-task Resolved Reynold Xin Actions
        29.
        We need to explicitly use transformDown when rewrite aggregation results Sub-task Resolved Josh Rosen Actions
        30.
        MutableProjection should evaluate all expressions first and then update the mutable row Sub-task Resolved Davies Liu Actions
        31.
        use new aggregate interface for hive UDAF Sub-task Resolved Wenchen Fan Actions
        32.
        Partial Aggregation Support for Hive UDAF Sub-task Resolved Cheng Hao Actions
        33.
        Better group distinct columns in query compilation Sub-task Resolved Unassigned Actions
        34.
        .Refactor AggregateFunction2 and AlgebraicAggregate interfaces to improve code clarity Sub-task Resolved Josh Rosen Actions
        35.
        Reduce duplication in Aggregate2's expression rewriting logic Sub-task Resolved Josh Rosen Actions
        36.
        Support ImperativeAggregates in TungstenAggregate Sub-task Resolved Josh Rosen Actions
        37.
        When planning queries without partial aggregation support, we should try to use TungstenAggregate. Sub-task Resolved Unassigned Actions
        38.
        Remove use of KVIterator in SortBasedAggregationIterator Sub-task Resolved Josh Rosen Actions
        39.
        Support single distinct count on multiple columns Sub-task Resolved Herman van Hövell Actions
        40.
        variance should alias var_samp instead of var_pop Sub-task Resolved Reynold Xin Actions
        41.
        Spark SQL SELECT COUNT DISTINCT optimization Sub-task Resolved Yin Huai Actions
        42.
        Restore the 1.5's behavior of planning a single distinct aggregation. Sub-task Resolved Yin Huai Actions
        43.
        Spark StdDev/Variance defaults are incompatible with Hive Sub-task Closed Unassigned Actions
        44.
        Improved multi-column counting Sub-task Resolved Herman van Hövell Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            chenghao Cheng Hao
            Votes:
            2 Vote for this issue
            Watchers:
            23 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment