Description
Noticed a case during RandomWalk where scans against a table were blocked for a long time. There were too many files for the tablet, and batch scans were blocked.
Eventually, the major compaction did start, but I'm not sure why it didn't start in order seconds to minutes. Eventually, it was noticed that the trace table (1) had a ton of WALs and needed a minor compaction. Shortly after that, a major compaction for our table (h6) was started.
2015-05-11 12:02:17,159 [tserver.TabletServer] DEBUG: Initiating minor compaction for 1<< because it has 35 write ahead logs 2015-05-11 12:02:17,160 [tserver.NativeMap] DEBUG: Allocated native map 0x000000000e198b30 2015-05-11 12:02:17,160 [tablet.Tablet] DEBUG: MinC initiate lock 0.00 secs 2015-05-11 12:02:17,160 [zookeeper.DistributedWorkQueue] DEBUG: Looking for work in /accumulo/374feac3-96ff-4ba1-99e6-8c1ab5f9d7c2/replication/workqueue 2015-05-11 12:02:17,172 [tablet.Tablet] DEBUG: Major compaction plan: [hdfs://cn020:8020/accumulo/tables/h6/t-0000hz8/C0000i04.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzh/C0000i0b.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hz e/C0000i05.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzj/C0000i10.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hz3/C0000i0c.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzm/C0000i0x.rf, hdfs://cn020:8020/accumulo/tables/h6/t-00 00hzk/C0000i0v.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hza/C0000i0j.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzc/C0000i0f.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzi/C0000i0q.rf] propogate deletes : true 2015-05-11 12:02:17,172 [tablet.Tablet] DEBUG: MajC initiate lock 0.00 secs, wait 0.00 secs 2015-05-11 12:02:17,173 [tablet.Tablet] DEBUG: Starting MajC h6<< (NORMAL) [hdfs://cn020:8020/accumulo/tables/h6/t-0000hz3/C0000i0c.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hze/C0000i05.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0 000hzm/C0000i0x.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hza/C0000i0j.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzi/C0000i0q.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzh/C0000i0b.rf, hdfs://cn020:8020/accumulo/tables/h6 /t-0000hz8/C0000i04.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzk/C0000i0v.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzj/C0000i10.rf, hdfs://cn020:8020/accumulo/tables/h6/t-0000hzc/C0000i0f.rf] --> hdfs://cn020:8020/accumulo/t ables/h6/default_tablet/C0000i2j.rf_tmp [] 2015-05-11 12:02:17,219 [tablet.MinorCompactor] DEBUG: Begin minor compaction hdfs://cn020:8020/accumulo/tables/1/default_tablet/F0000i2i.rf_tmp 1<< 2015-05-11 12:02:17,290 [tabletserver.LargestFirstMemoryManager] DEBUG: BEFORE compactionThreshold = 0.851 maxObserved = 11,134,533 2015-05-11 12:02:17,290 [tabletserver.LargestFirstMemoryManager] DEBUG: AFTER compactionThreshold = 0.851 2015-05-11 12:02:17,337 [tserver.TabletServer] DEBUG: ScanSess tid 172.18.128.20:35527 +rep 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-05-11 12:02:17,833 [tablet.Compactor] DEBUG: Compaction h6<< 54,721 read | 54,721 written | 238,956 entries/sec | 0.229 secs 2015-05-11 12:02:17,841 [tserver.TabletServer] DEBUG: ScanSess tid 172.18.128.20:37393 +r 1 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-05-11 12:02:17,841 [tserver.TabletServer] INFO : Adding 1 logs for extent !0<< as alias 230 2015-05-11 12:02:17,842 [tserver.TabletServer] INFO : Adding 1 logs for extent +r<< as alias 1 2015-05-11 12:02:17,941 [tablet.Compactor] DEBUG: Compaction 1<< 47,890 read | 47,713 written | 164,006 entries/sec | 0.292 secs
After the major compactions completed, things got back to normal. It seems highly unusual to sit there blocked for so long.