Hadoop Common
  1. Hadoop Common
  2. HADOOP-1342

A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

        Activity

        Hide
        Runping Qi added a comment -

        This patch added a limit on the number of unique values for UniqueValueCount aggregator. If the actual number of values is greater than the limit, the counter will be limit + 1.

        The limit is under the attribute name: "aggregate.max.num.unique.values".
        It can be set by calling job.setLong("aggregate.max.num.unique.values", 200).
        The default is Long.MAX_VALUE (same as the current behavior).

        Show
        Runping Qi added a comment - This patch added a limit on the number of unique values for UniqueValueCount aggregator. If the actual number of values is greater than the limit, the counter will be limit + 1. The limit is under the attribute name: "aggregate.max.num.unique.values". It can be set by calling job.setLong("aggregate.max.num.unique.values", 200). The default is Long.MAX_VALUE (same as the current behavior).
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12357108/patch-1342.txt applied and successfully tested against trunk revision r536583. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/132/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/132/console
        Hide
        Doug Cutting added a comment -

        I'm not sure what's changed, but this no longer passes unit tests against trunk.

        Testcase: testAggregates took 1.59 sec
        FAILED
        expected:<...5...> but was:<...9...>

        Show
        Doug Cutting added a comment - I'm not sure what's changed, but this no longer passes unit tests against trunk. Testcase: testAggregates took 1.59 sec FAILED expected:<...5...> but was:<...9...>
        Hide
        Runping Qi added a comment -

        Looks like the changes made on TestAggregates part were applied, but the changes on the aggregate code did not.

        Can you try to re-apply the patch? Or send me the following files in your trunk:

        ValueAggregatorBaseDescriptor.java and
        UniqValueCount.java

        so that I can take a look at them.

        Show
        Runping Qi added a comment - Looks like the changes made on TestAggregates part were applied, but the changes on the aggregate code did not. Can you try to re-apply the patch? Or send me the following files in your trunk: ValueAggregatorBaseDescriptor.java and UniqValueCount.java so that I can take a look at them.
        Hide
        Doug Cutting added a comment -

        The patch simply fails to apply to trunk.

        Show
        Doug Cutting added a comment - The patch simply fails to apply to trunk.
        Hide
        Runping Qi added a comment -

        That explained why the unit test failed.

        The patch failed to apply because r537300 did some format change on TestAggregates.java, which caused conflicts.

        I will re-generate the patch next.

        Show
        Runping Qi added a comment - That explained why the unit test failed. The patch failed to apply because r537300 did some format change on TestAggregates.java, which caused conflicts. I will re-generate the patch next.
        Hide
        Runping Qi added a comment -


        A new patch with conflict with trunk resolved.

        Show
        Runping Qi added a comment - A new patch with conflict with trunk resolved.
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12357147/patch-1342.txt applied and successfully tested against trunk revision r537295. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/136/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/136/console
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Runping!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Runping!
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #89 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/89/ )

          People

          • Assignee:
            Runping Qi
            Reporter:
            Runping Qi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development