Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11444

ACID Compactor should generate stats/alerts



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.0.0
    • None
    • Transactions
    • None


      Compaction should generate stats about number of files it reads, min/max/avg size etc. It should also generate alerts if it looks like the system is not configured correctly.

      For example, if there are lots of delta files with very small files, it's a good sign that Streaming API is configured with batches that are too small.

      Simplest idea is to add another periodic task to AcidHouseKeeperService to
      //periodically do select count, min(txnid),max(txnid), type from txns group by type.
      //1. dump that to log file at info
      //2. could also keep counts for last 10min, hour, 6 hours, 24 hours, etc
      //2.2 if a large increase is detected - issue alert (at least to the log for now) at warn/error

      Should also alert if there is ACID activity but no compactions running.
      One way to do this is to add logic to TxnHandler to periodically check contents of COMPACTION_QUEUE table and keep a simple histogram of compactions over last few hours.
      Similarly can run a periodic check of transactions started (or committed/aborted) and keep a simple histogram. Then the 2 can be used to detect that there is ACID write activity but no compaction activity.


        1. HIVE-11444.1.wip.patch
          13 kB
          Wei Zheng

        Issue Links



              wzheng Wei Zheng
              ekoifman Eugene Koifman
              0 Vote for this issue
              4 Start watching this issue