Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-1345

Provide feedback that a compaction is "stuck"

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: tserver
    • Labels:
      None

      Description

      The system should be able to detect when a compaction has not read or written data in a while, indicating that it may be stuck on something (e.g. an infinite loop in a user iterator).

        Issue Links

          Activity

          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2ab30557c434c52c75203603e713f9cd7aa4d454 in branch refs/heads/master from [~keith_turner]
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=2ab3055 ]

          ACCUMULO-1345 added filename to log message about stuck compactions

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2ab30557c434c52c75203603e713f9cd7aa4d454 in branch refs/heads/master from [~keith_turner] [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=2ab3055 ] ACCUMULO-1345 added filename to log message about stuck compactions
          Hide
          kturner Keith Turner added a comment -

          Here is an example of what this the commit I just made does. Below shows configuring accumulo to warn if a compaction does not make progress in 30 seconds AND then setting the slow iterator to sleep for 60 seconds.

          root@test16> config -s tserver.compaction.warn.time=30s
          root@test16> table foo
          root@test16 foo> config -t foo -s table.iterator.minc.slow=100,org.apache.accumulo.test.functional.SlowIterator
          root@test16 foo> config -t foo -s table.iterator.minc.slow.opt.sleepTime=60000
          root@test16 foo> insert r1 cf1 cq1 v1
          root@test16 foo> flush -t foo 
          2013-09-06 18:43:38,299 [shell.Shell] INFO : Flush of table foo initiated...
          

          Eventually, the following shows up in the tserver logs.

          2013-09-06 18:44:27,044 [tabletserver.CompactionWatcher] WARN : Compaction of 2<< has not made progress for at least 39999ms
          java.lang.Exception: Possible stack trace of compaction stuck on 2<<
                  at java.lang.Thread.sleep(Native Method)
                  at org.apache.accumulo.core.util.UtilWaitThread.sleep(UtilWaitThread.java:26)
                  at org.apache.accumulo.test.functional.SlowIterator.next(SlowIterator.java:56)
                  at org.apache.accumulo.server.tabletserver.Compactor.compactLocalityGroup(Compactor.java:499)
                  at org.apache.accumulo.server.tabletserver.Compactor.call(Compactor.java:357)
                  at org.apache.accumulo.server.tabletserver.MinorCompactor.call(MinorCompactor.java:96)
                  at org.apache.accumulo.server.tabletserver.Tablet.minorCompact(Tablet.java:2085)
                  at org.apache.accumulo.server.tabletserver.Tablet.access$4300(Tablet.java:157)
                  at org.apache.accumulo.server.tabletserver.Tablet$MinorCompactionTask.run(Tablet.java:2172)
                  at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
                  at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
                  at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
                  at java.lang.Thread.run(Thread.java:662)
          2013-09-06 18:44:38,384 [tabletserver.Compactor] DEBUG: Compaction 2<< 1 read | 1 written |      0 entries/sec | 60.006 secs
          2013-09-06 18:44:47,043 [tabletserver.CompactionWatcher] INFO : Compaction of 2<< is no longer stuck
          
          Show
          kturner Keith Turner added a comment - Here is an example of what this the commit I just made does. Below shows configuring accumulo to warn if a compaction does not make progress in 30 seconds AND then setting the slow iterator to sleep for 60 seconds. root@test16> config -s tserver.compaction.warn.time=30s root@test16> table foo root@test16 foo> config -t foo -s table.iterator.minc.slow=100,org.apache.accumulo.test.functional.SlowIterator root@test16 foo> config -t foo -s table.iterator.minc.slow.opt.sleepTime=60000 root@test16 foo> insert r1 cf1 cq1 v1 root@test16 foo> flush -t foo 2013-09-06 18:43:38,299 [shell.Shell] INFO : Flush of table foo initiated... Eventually, the following shows up in the tserver logs. 2013-09-06 18:44:27,044 [tabletserver.CompactionWatcher] WARN : Compaction of 2<< has not made progress for at least 39999ms java.lang.Exception: Possible stack trace of compaction stuck on 2<< at java.lang.Thread.sleep(Native Method) at org.apache.accumulo.core.util.UtilWaitThread.sleep(UtilWaitThread.java:26) at org.apache.accumulo.test.functional.SlowIterator.next(SlowIterator.java:56) at org.apache.accumulo.server.tabletserver.Compactor.compactLocalityGroup(Compactor.java:499) at org.apache.accumulo.server.tabletserver.Compactor.call(Compactor.java:357) at org.apache.accumulo.server.tabletserver.MinorCompactor.call(MinorCompactor.java:96) at org.apache.accumulo.server.tabletserver.Tablet.minorCompact(Tablet.java:2085) at org.apache.accumulo.server.tabletserver.Tablet.access$4300(Tablet.java:157) at org.apache.accumulo.server.tabletserver.Tablet$MinorCompactionTask.run(Tablet.java:2172) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang.Thread.run(Thread.java:662) 2013-09-06 18:44:38,384 [tabletserver.Compactor] DEBUG: Compaction 2<< 1 read | 1 written | 0 entries/sec | 60.006 secs 2013-09-06 18:44:47,043 [tabletserver.CompactionWatcher] INFO : Compaction of 2<< is no longer stuck
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit dee8bbb98ba155bec1612d4a4919648159efeb1e in branch refs/heads/master from [~keith_turner]
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=dee8bbb ]

          ACCUMULO-1345 log warning and stack trace when compaction does not make progress

          Show
          jira-bot ASF subversion and git services added a comment - Commit dee8bbb98ba155bec1612d4a4919648159efeb1e in branch refs/heads/master from [~keith_turner] [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=dee8bbb ] ACCUMULO-1345 log warning and stack trace when compaction does not make progress
          Hide
          kturner Keith Turner added a comment -

          If compactions were run in a separate process (ACCUMULO-1188), then that process could be killed. Also the tserver monitor the child process for progress.

          Show
          kturner Keith Turner added a comment - If compactions were run in a separate process ( ACCUMULO-1188 ), then that process could be killed. Also the tserver monitor the child process for progress.
          Hide
          mdrob Mike Drob added a comment -

          We saw compactions getting stuck and think it was caused by HDFS-88 with Accumulo 1.4.3

          Show
          mdrob Mike Drob added a comment - We saw compactions getting stuck and think it was caused by HDFS-88 with Accumulo 1.4.3
          Hide
          mdrob Mike Drob added a comment -

          Thinking about this some more, I'm not sure that listcompactions filtering on tables would be good enough. Compactions should report to the master that they have made progress (similar to MR tasks), either through entries read/written, or through an explicit progress call inside of an iterator. If progress hasn't been made for a configurable amount of time (10 minutes) then action should be taken.

          Show
          mdrob Mike Drob added a comment - Thinking about this some more, I'm not sure that listcompactions filtering on tables would be good enough. Compactions should report to the master that they have made progress (similar to MR tasks), either through entries read/written, or through an explicit progress call inside of an iterator. If progress hasn't been made for a configurable amount of time (10 minutes) then action should be taken.
          Hide
          mdrob Mike Drob added a comment -

          Keith Turner - Yea, that seems reasonable.

          Show
          mdrob Mike Drob added a comment - Keith Turner - Yea, that seems reasonable.
          Hide
          kturner Keith Turner added a comment -

          listing compactions was added in 1.5. I looking at the command, it only supports filtering on tablet servers. If that command could filter on tables and the compact command did not print warnings (ACCUMULO-1344) would this be an adequate solution?

          Show
          kturner Keith Turner added a comment - listing compactions was added in 1.5. I looking at the command, it only supports filtering on tablet servers. If that command could filter on tables and the compact command did not print warnings ( ACCUMULO-1344 ) would this be an adequate solution?

            People

            • Assignee:
              kturner Keith Turner
              Reporter:
              mdrob Mike Drob
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development