Hive
  1. Hive
  2. HIVE-3048

Collect_set Aggregate does uneccesary check for value.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      Sets already de-duplicate for free no need for existence check.

           private void putIntoSet(Object p, MkArrayAggregationBuffer myagg) {
            if (myagg.container.contains(p))
              return;
             Object pCopy = ObjectInspectorUtils.copyToStandardObject(p,
                 this.inputOI);
             myagg.container.add(pCopy);
      
      1. HIVE-3048.patch.1.txt
        0.6 kB
        Edward Capriolo

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        +1
        Existing implementation actually looks buggy to me. It checks for existence of one object and then adds another object. In general case, two object's may have different hashcodes and then you are screwed. It will work however as long as underlying object's hashcode is based on value which will be the case for primitive types and containers containing primitive types which is the case for Hive datatypes. It's always a good practice to just add your objects in set and let set take care of duplicate elimination.

        Show
        Ashutosh Chauhan added a comment - +1 Existing implementation actually looks buggy to me. It checks for existence of one object and then adds another object. In general case, two object's may have different hashcodes and then you are screwed. It will work however as long as underlying object's hashcode is based on value which will be the case for primitive types and containers containing primitive types which is the case for Hive datatypes. It's always a good practice to just add your objects in set and let set take care of duplicate elimination.
        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk.

        Show
        Ashutosh Chauhan added a comment - Committed to trunk.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1515 (See https://builds.apache.org/job/Hive-trunk-h0.21/1515/)
        HIVE-3048 : Collect_set Aggregate does uneccesary check for value. (Ed Capriolo via Ashutosh Chauhan) (Revision 1354079)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1354079
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1515 (See https://builds.apache.org/job/Hive-trunk-h0.21/1515/ ) HIVE-3048 : Collect_set Aggregate does uneccesary check for value. (Ed Capriolo via Ashutosh Chauhan) (Revision 1354079) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1354079 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-3048 : Collect_set Aggregate does uneccesary check for value. (Ed Capriolo via Ashutosh Chauhan) (Revision 1354079)

        Result = ABORTED
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1354079
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3048 : Collect_set Aggregate does uneccesary check for value. (Ed Capriolo via Ashutosh Chauhan) (Revision 1354079) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1354079 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          People

          • Assignee:
            Edward Capriolo
            Reporter:
            Edward Capriolo
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development