Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19996

Beeline performance poor with drivers having slow DatabaseMetaData.getPrimaryKeys impl

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 1.2.1
    • None
    • Beeline
    • None
    • Issue detected using Beeline with HBase Phoenix thin driver and a result set with many columns.

    Description

      Beeline performance is rather poor for table output format when two conditions occur for the same result set.

      1. The result set has a large number of columns.
      2. The driver being used has a slow implementation of DatabaseMetaData.getPrimaryKeys.

      For example testing has shown that for a query with ~100 columns using the HBase Phoenix thin driver the execution time can be cut from ~30 seconds to ~2 seconds by using CSV output format vs table output format. For example: select * from system.catalog;

      This is due to how primary keys are detected. Currently the Rows implementation will make a metadata call for every column to determine it is a primary key for display purposes. I propose optimizing this such that a metadata call is only made for each unique table in the result set's columns.

      Attachments

        1. HIVE-19996.1.patch
          7 kB
          Kevin Minder

        Activity

          People

            kminder Kevin Minder
            kminder Kevin Minder
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: