Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19996

Beeline performance poor with drivers having slow DatabaseMetaData.getPrimaryKeys impl

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2.1
    • Fix Version/s: None
    • Component/s: Beeline
    • Labels:
      None
    • Environment:

      Issue detected using Beeline with HBase Phoenix thin driver and a result set with many columns.

      Description

      Beeline performance is rather poor for table output format when two conditions occur for the same result set.

      1. The result set has a large number of columns.
      2. The driver being used has a slow implementation of DatabaseMetaData.getPrimaryKeys.

      For example testing has shown that for a query with ~100 columns using the HBase Phoenix thin driver the execution time can be cut from ~30 seconds to ~2 seconds by using CSV output format vs table output format. For example: select * from system.catalog;

      This is due to how primary keys are detected. Currently the Rows implementation will make a metadata call for every column to determine it is a primary key for display purposes. I propose optimizing this such that a metadata call is only made for each unique table in the result set's columns.

        Attachments

        1. HIVE-19996.1.patch
          7 kB
          Kevin Minder

          Activity

            People

            • Assignee:
              kminder Kevin Minder
              Reporter:
              kminder Kevin Minder
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: