Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-1995

Improve range partitioning using histogram

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 0.13.0
    • QueryMaster
    • None

    Description

      Currently implemented range repartition algorithm has two major problems as follows:

      • It assumes that data distribution is uniform, so is fragile for skewed data distribution.
      • Given floating point values, it ignores the numbers to the right to the decimal point, so is difficult to guess the proper partition number.

      One of the solutions for this problem is to use the histogram. With a histogram, we can figure out data distribution and provide a proper handling of floating point values.

      Attachments

        Issue Links

          Activity

            People

              jihoonson Jihoon Son
              jihoonson Jihoon Son
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: