Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24824

Define metrics for compaction observability

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Many times if there are failures in the Compaction background processes (Initiator, Worker, Cleaner) it is hard notice the problem until it causes serious performance degradation.
      We should create new JMX metrics, that would make it easier to monitor the compaction health. Examples are:

      • number of failed / initiated compaction
      • number of aborted txns, oldest aborted txns
      • tables with disabled compactions and writes
      • Initiator and Cleaner cycle runtime
      • Size of ACID metadata tables that should have ~ constant rows (txn_to_writeId, completed_txns)

       

      Attachments

        Issue Links

          1.
          Create AcidMetricsService Sub-task Closed Peter Varga  
          2.
          Initiator / Cleaner performance metrics Sub-task Resolved Denys Kuzmenko

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          3.
          Worker performance metric Sub-task Resolved Denys Kuzmenko

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 10m
          4.
          Create new metrics about ACID metadata size Sub-task Resolved Denys Kuzmenko

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 10m
          5.
          Add host and version information to compection queue Sub-task Resolved Peter Varga

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          6.
          Create new metrics about open transactions Sub-task Resolved Denys Kuzmenko

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          7.
          Create new metric about Initiator / Worker hosts Sub-task Resolved Denys Kuzmenko

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          8.
          New metrics about aborted transactions Sub-task Closed Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          9.
          Create new metrics about locks Sub-task Resolved Denys Kuzmenko

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          10.
          Rename metrics that have spaces in the name Sub-task Closed Antal Sinkovits

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          11.
          Create new metrics about Initiator / Cleaner failures Sub-task Closed Antal Sinkovits

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          12.
          Divide oldest_open_txn into oldest replication and non-replication transactions Sub-task Closed Antal Sinkovits

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          13.
          Metric about incomplete compactions Sub-task Resolved Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          14.
          Add timeout for failed and did not initiate compaction cleanup Sub-task Closed Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          15.
          Create new metrics about the number of delta files in the ACID table Sub-task Resolved Denys Kuzmenko

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 50m
          16.
          Create metric: Number of tables with > x aborts Sub-task Closed Antal Sinkovits

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 10m
          17.
          Put metrics collection behind a feature flag Sub-task Closed Antal Sinkovits

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 10m
          18.
          tables_with_x_aborted_transactions should count partition/unpartitioned tables Sub-task Closed Antal Sinkovits

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          19.
          Create metric about oldest entry in "ready for cleaning" state Sub-task Closed Antal Sinkovits

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          20.
          Tweak delta metrics with custom MBean for Prometheus Sub-task Closed Denys Kuzmenko

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 10m
          21.
          All new compaction metrics should be lower case Sub-task Closed Antal Sinkovits

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          22.
          Number of initiator hosts metric should ignore manually initiated compactions Sub-task Closed Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          23.
          Skip metrics collection about writes to tables with tblproperty no_auto_compaction=true if CTAS Sub-task Closed Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 10m
          24.
          Changes to metastore API in HIVE-24880 are not backwards compatible Sub-task Closed Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          25.
          Metrics compaction_failed_initiator_ratio and compaction_failed_cleaner_ratio should be counters Sub-task Closed Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          26.
          Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit Sub-task Closed Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          27.
          Delta metrics keys should contain database name Sub-task Closed László Pintér

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          28.
          Delta metrics collection may cause NPE Sub-task Resolved Karen Coppage

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m

          Activity

            People

              dkuzmenko Denys Kuzmenko
              pvargacl Peter Varga
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 32h 40m
                  32h 40m