Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13161

numBuckets calculate wrong in BinaryHashBucketArea

    XMLWordPrintableJSON

Details

    Description

      The original code is:

       

      int minNumBuckets = (int) Math.ceil((estimatedRowCount / loadFactor / NUM_ENTRIES_PER_BUCKET));
      int bucketNumSegs = Math.max(1, Math.min(maxSegs, (minNumBuckets >>> table.bucketsPerSegmentBits) +
            ((minNumBuckets & table.bucketsPerSegmentMask) == 0 ? 0 : 1)));
      int numBuckets = MathUtils.roundDownToPowerOf2(bucketNumSegs << table.bucketsPerSegmentBits);
      

      default value: loadFactor=0.75, NUM_ENTRIES_PER_BUCKET=15,maxSegs = 33(suppose, only need big than the number which calculated by minBunBuckets)

      We suppose table.bucketsPerSegmentBits = 3, table.bucketsPerSegmentMask = 0b111. It means buckets in a segment is 8.

      When set estimatedRowCount loop from 1 to 1000, we will see the result in attach file.

      I will take an example:

      estimatedRowCount: 200, minNumBuckets: 18, bucketNumSegs: 3, numBuckets: 16
      

      We can see it request 3 segment, but only 2 segment needed(16 / 8), left one segment wasted. 

      And consider the segment is preallocated, it means some segments will never used.

       

       

      Attachments

        1. result.log
          148 kB
          Louis Xu

        Issue Links

          Activity

            People

              LouisXu777 Louis Xu
              LouisXu777 Louis Xu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m