Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1346

Use HBase table size information to improve scan parallelization

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.5.0
    • Component/s: Storage - HBase
    • Labels:
      None

      Description

      Currently we use a pseudo-estimated value to calculate the scan size which does not take the actual size of data into account.

      HBase, through o.a.h.h.client.HBaseAdmin.getClusterStatus(), provides a way to retrieve the actual data size of each region. We can use this to approximate the size of scan and use it to improve the scan parallelization.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adityakishore Aditya Kishore
                Reporter:
                adityakishore Aditya Kishore
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: