Hive
  1. Hive
  2. HIVE-1815

The class HiveResultSet should implement batch fetching.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: JDBC
    • Labels:
      None
    • Environment:

      Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results.

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Use batch fetching on the hive jdbc driver to increase performance.

      Description

      When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query.
      Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size.

      It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction.
      (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

      1. HIVE-1815.1.patch.txt
        5 kB
        Bennie Schut
      2. HIVE-1815.2.patch.txt
        6 kB
        Bennie Schut

        Activity

        Hide
        Carl Steinbach added a comment -

        Committed as HIVE-1851.

        Show
        Carl Steinbach added a comment - Committed as HIVE-1851 .
        Hide
        Ning Zhang added a comment -

        Committed. Thanks Bennie!

        Show
        Ning Zhang added a comment - Committed. Thanks Bennie!
        Hide
        Ning Zhang added a comment -

        +1. Will commit if tests pass.

        Show
        Ning Zhang added a comment - +1. Will commit if tests pass.
        Hide
        Bennie Schut added a comment -

        Updated to use an iterator instead of deleting items.

        Show
        Bennie Schut added a comment - Updated to use an iterator instead of deleting items.
        Show
        Bennie Schut added a comment - https://reviews.apache.org/r/514/
        Hide
        Bennie Schut added a comment -

        This is the simplest implementation I could do. Just changed the fetchOne to fetchN and return the result on each next() call until the list is empty and then do another fetchN. We've used this for a week and the performance increase on large resultsets is significant. You could also do the fetchN on a different thread to keep the queue full but that's a bit more work for just a little more gain.

        I've added 1 small test to call the setFetchSize and getFetchSize
        but the jdbc tests should all work like they worked before this test since the functionality doesn't change.

        Show
        Bennie Schut added a comment - This is the simplest implementation I could do. Just changed the fetchOne to fetchN and return the result on each next() call until the list is empty and then do another fetchN. We've used this for a week and the performance increase on large resultsets is significant. You could also do the fetchN on a different thread to keep the queue full but that's a bit more work for just a little more gain. I've added 1 small test to call the setFetchSize and getFetchSize but the jdbc tests should all work like they worked before this test since the functionality doesn't change.
        Hide
        Steven Wong added a comment -

        Using fetchN will be nice. The Hive driver should use fetchN in accordance with the fetch size setting in the Statement or ResultSet object as set by setFetchSize, which will need to be implemented along with using fetchN. (The Hive driver currently does not support setFetchSize and getFetchSize.)

        Show
        Steven Wong added a comment - Using fetchN will be nice. The Hive driver should use fetchN in accordance with the fetch size setting in the Statement or ResultSet object as set by setFetchSize, which will need to be implemented along with using fetchN. (The Hive driver currently does not support setFetchSize and getFetchSize.)
        Hide
        Bennie Schut added a comment -

        Thanks for the comment Edward, fetchN is part of the hive server yes. But the jdbc driver doesn't use this. The HiveQueryResultSet call's a client.fetchOne();
        It would be nice if the jdbc driver uses the fetchN you suggested in the comment and perhaps keeps a little queue of records and when dropping below a threshold do another fetchN.
        I've also noticed the same slowness and it makes sence this was probably cause by using fetchOne().

        Show
        Bennie Schut added a comment - Thanks for the comment Edward, fetchN is part of the hive server yes. But the jdbc driver doesn't use this. The HiveQueryResultSet call's a client.fetchOne(); It would be nice if the jdbc driver uses the fetchN you suggested in the comment and perhaps keeps a little queue of records and when dropping below a threshold do another fetchN. I've also noticed the same slowness and it makes sence this was probably cause by using fetchOne().
        Hide
        Edward Capriolo added a comment -

        This is implemented

        vi service/if/hive_service.thrift

        1. Fetch a given number of rows or remaining number of
        2. rows whichever is smaller.
          list<string> fetchN(1:i32 numRows) throws(1:HiveServerException ex)

        Please be sure when filing jira issues to be using trunk or the most recent release. Also I am fairly sure fetchn was a part of the hive server since the get go.

        Show
        Edward Capriolo added a comment - This is implemented vi service/if/hive_service.thrift Fetch a given number of rows or remaining number of rows whichever is smaller. list<string> fetchN(1:i32 numRows) throws(1:HiveServerException ex) Please be sure when filing jira issues to be using trunk or the most recent release. Also I am fairly sure fetchn was a part of the hive server since the get go.

          People

          • Assignee:
            Bennie Schut
            Reporter:
            Guy le Mar
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development