Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1346

Use HBase table size information to improve scan parallelization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.5.0
    • 0.5.0
    • Storage - HBase
    • None

    Description

      Currently we use a pseudo-estimated value to calculate the scan size which does not take the actual size of data into account.

      HBase, through o.a.h.h.client.HBaseAdmin.getClusterStatus(), provides a way to retrieve the actual data size of each region. We can use this to approximate the size of scan and use it to improve the scan parallelization.

      Attachments

        Issue Links

          Activity

            People

              adityakishore Aditya Kishore
              adityakishore Aditya Kishore
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: