Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14427

CompactionTxnHandler.markCleaned() can delete aborted txns

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Transactions
    • None

    Description

      We can modify

      s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and txn_state = '" +
                TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table = '" +
                info.tableName + "'" + (info.highestTxnId == 0 ? "" : " and txn_id <= " + info.highestTxnId);
      

      to use select txn_id, count ... group by txn_id so that we know the number of components in a TXN.

      Then when running "delete from TXN_COMPONENTS where..." we know how many rows were deleted.
      If the sum of all values from 1st query matched total number of rows deleted, we know that all Aborted txns in this set are empty and thus can be deleted here.

      This means we clean up aborted txns from TXNS table quicker and avoid a large join in cleanEmptyAbortedTxns(). Also, doing delete on TXNS here will have PKs in WHERE clause so it should be cheap.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ekoifman Eugene Koifman
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: