Hive
  1. Hive
  2. HIVE-3048

Collect_set Aggregate does uneccesary check for value.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      Sets already de-duplicate for free no need for existence check.

           private void putIntoSet(Object p, MkArrayAggregationBuffer myagg) {
            if (myagg.container.contains(p))
              return;
             Object pCopy = ObjectInspectorUtils.copyToStandardObject(p,
                 this.inputOI);
             myagg.container.add(pCopy);
      
      1. HIVE-3048.patch.1.txt
        0.6 kB
        Edward Capriolo

        Activity

        Edward Capriolo created issue -
        Edward Capriolo made changes -
        Field Original Value New Value
        Attachment HIVE-3048.patch.1.txt [ 12528777 ]
        Edward Capriolo made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.8.1 [ 12319268 ]
        Hide
        Ashutosh Chauhan added a comment -

        +1
        Existing implementation actually looks buggy to me. It checks for existence of one object and then adds another object. In general case, two object's may have different hashcodes and then you are screwed. It will work however as long as underlying object's hashcode is based on value which will be the case for primitive types and containers containing primitive types which is the case for Hive datatypes. It's always a good practice to just add your objects in set and let set take care of duplicate elimination.

        Show
        Ashutosh Chauhan added a comment - +1 Existing implementation actually looks buggy to me. It checks for existence of one object and then adds another object. In general case, two object's may have different hashcodes and then you are screwed. It will work however as long as underlying object's hashcode is based on value which will be the case for primitive types and containers containing primitive types which is the case for Hive datatypes. It's always a good practice to just add your objects in set and let set take care of duplicate elimination.
        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk.

        Show
        Ashutosh Chauhan added a comment - Committed to trunk.
        Ashutosh Chauhan made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.10.0 [ 12320745 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1515 (See https://builds.apache.org/job/Hive-trunk-h0.21/1515/)
        HIVE-3048 : Collect_set Aggregate does uneccesary check for value. (Ed Capriolo via Ashutosh Chauhan) (Revision 1354079)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1354079
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1515 (See https://builds.apache.org/job/Hive-trunk-h0.21/1515/ ) HIVE-3048 : Collect_set Aggregate does uneccesary check for value. (Ed Capriolo via Ashutosh Chauhan) (Revision 1354079) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1354079 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-3048 : Collect_set Aggregate does uneccesary check for value. (Ed Capriolo via Ashutosh Chauhan) (Revision 1354079)

        Result = ABORTED
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1354079
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3048 : Collect_set Aggregate does uneccesary check for value. (Ed Capriolo via Ashutosh Chauhan) (Revision 1354079) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1354079 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.
        Ashutosh Chauhan made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        2m 35s 1 Edward Capriolo 23/May/12 21:49
        Patch Available Patch Available Resolved Resolved
        33d 19h 14m 1 Ashutosh Chauhan 26/Jun/12 17:04
        Resolved Resolved Closed Closed
        198d 3h 49m 1 Ashutosh Chauhan 10/Jan/13 19:53

          People

          • Assignee:
            Edward Capriolo
            Reporter:
            Edward Capriolo
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development