Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1995

Improve range partitioning using histogram

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: QueryMaster
    • Labels:
      None

      Description

      Currently implemented range repartition algorithm has two major problems as follows:

      • It assumes that data distribution is uniform, so is fragile for skewed data distribution.
      • Given floating point values, it ignores the numbers to the right to the decimal point, so is difficult to guess the proper partition number.

      One of the solutions for this problem is to use the histogram. With a histogram, we can figure out data distribution and provide a proper handling of floating point values.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jihoonson Jihoon Son
                Reporter:
                jihoonson Jihoon Son
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: