Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-1624

HyperLogLogPlusCounter will become inaccurate when there're billions of entries

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v1.5.2
    • None
    • None

    Description

      final List<HyperLogLogPlusCounter> counters = Lists.newArrayList();
      ExecutorService service = Executors.newFixedThreadPool(20);
      final CountDownLatch latch = new CountDownLatch(20);
      for (int i = 0; i < 20; i++) {

      service.submit(new Runnable() {
      @Override
      public void run() {
      Random rand = new Random();
      HyperLogLogPlusCounter counter = new HyperLogLogPlusCounter(14);
      for (long j = 0; j < 500000000; j++) {
      if (j % 1000000 == 1)

      { System.out.println(j); }

      counter.add("" + rand.nextLong());
      }
      System.out.println("final" + counter.getCountEstimate());
      counters.add(counter);
      latch.countDown();
      }
      });
      }
      latch.await();
      System.out.println("latch done");

      HyperLogLogPlusCounter ret = new HyperLogLogPlusCounter(14);
      for (HyperLogLogPlusCounter c : counters)

      { ret.merge(c); }

      System.out.println(ret.getCountEstimate());

      expected output is 10B however the output can be less than 1B

      Attachments

        Activity

          People

            liyang.gmt8@gmail.com liyang
            mahongbin Hongbin Ma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: