Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11317

ACID: Improve transaction Abort logic due to timeout

    XMLWordPrintableJSON

Details

    Description

      the logic to Abort transactions that have stopped heartbeating is in
      TxnHandler.timeOutTxns()
      This is only called when DbTxnManger.getValidTxns() is called.
      So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system.

      Also, streaming api doesn't call DbTxnManager at all.

      Need to move this logic into Initiator (or some other metastore side thread).
      Also, make sure it is broken up into multiple small(er) transactions against metastore DB.

      Also more timeOutLocks() locks there as well.

      see about adding TXNS.COMMENT field which can be used for "Auto aborted due to timeout" for example.

      The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them)

      Attachments

        1. HIVE-11317.patch
          33 kB
          Eugene Koifman
        2. HIVE-11317.2.patch
          33 kB
          Eugene Koifman
        3. HIVE-11317.3.patch
          34 kB
          Eugene Koifman
        4. HIVE-11317.4.patch
          38 kB
          Eugene Koifman
        5. HIVE-11317.5.patch
          38 kB
          Eugene Koifman
        6. HIVE-11317.6.patch
          38 kB
          Eugene Koifman

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: