Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15267

Make query length calculation logic more accurate in TxnUtils.needNewQuery()



    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1, 2.1.0
    • 3.0.0
    • Hive, Transactions
    • None


      In HIVE-15181 there's such review comment, for which this ticket will handle

      in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do the right thing.
      If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most likely want each SQL string to be at most 1K.
      But if sizeInBytes=2047, this still returns false.
      It should include length of "suffix" in computation of sizeInBytes
      Along the same lines: the check for max query length is done after each batch is already added to the query. Suppose there are 1000 9-digit txn IDs in each IN(...). That's, conservatively, 18KB of text. So the length of each query is increasing in 18KB chunks. 
      I think the check for query length should be done for each item in IN clause.
      If some DB has a limit on query length of X, then any query > X will fail. So I think this must ensure not to produce any queries > X, even by 1 char.
      For example, case 3.1 of the UT generates a query of almost 4000 characters - this is clearly > 1KB.


        1. HIVE-15267.01.patch
          12 kB
          Steve Yeom
        2. HIVE-15267.02.patch
          12 kB
          Steve Yeom
        3. HIVE-15267.03.patch
          16 kB
          Steve Yeom
        4. HIVE-15267.04.patch
          15 kB
          Steve Yeom
        5. HIVE-15267.05.patch
          15 kB
          Steve Yeom
        6. HIVE-15267.06.patch
          15 kB
          Steve Yeom

        Issue Links



              steveyeom2017 Steve Yeom
              wzheng Wei Zheng
              0 Vote for this issue
              5 Start watching this issue