Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3921

Hive LIMIT 1 queries take too long

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • Execution - Flow
    • None

    Description

      Fragment initialization on a Hive table (that is backed by a directory of many files) can take really long. This is evident through LIMIT 1 queries. The root cause is that the underlying reader in the HiveRecordReader is initialized when the ctor is called, rather than when setup is called.

      Two changes need to be made:
      1) lazily initialize the underlying record reader in HiveRecordReader
      2) allow for running a callable as a proxy user within an operator (through OperatorContext). This is required as initialization of the underlying record reader needs to be done as a proxy user (proxy for owner of the file). Previously, this was handled while creating the record batch tree.

      Attachments

        Activity

          People

            sudheeshkatkam Sudheesh Katkam
            sudheeshkatkam Sudheesh Katkam
            Krystal Krystal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: