Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-16003 Compacting memstore related fixes
  3. HBASE-16162

Compacting Memstore : unnecessary push of active segments to pipeline

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • None
    • None
    • Reviewed

    Description

      We have flow like this

      protected void checkActiveSize() {
          if (shouldFlushInMemory()) {
               InMemoryFlushRunnable runnable = new InMemoryFlushRunnable();
            }
            getPool().execute(runnable);
          }
        }
      private boolean shouldFlushInMemory() {
          if(getActive().getSize() > inmemoryFlushSize) {
            // size above flush threshold
            return (allowCompaction.get() && !inMemoryFlushInProgress.get());
          }
          return false;
        }
      
      void flushInMemory() throws IOException {
          // Phase I: Update the pipeline
          getRegionServices().blockUpdates();
          try {
            MutableSegment active = getActive();
            pushActiveToPipeline(active);
          } finally {
            getRegionServices().unblockUpdates();
          }
          // Phase II: Compact the pipeline
          try {
            if (allowCompaction.get() && inMemoryFlushInProgress.compareAndSet(false, true)) {
              // setting the inMemoryFlushInProgress flag again for the case this method is invoked
              // directly (only in tests) in the common path setting from true to true is idempotent
              // Speculative compaction execution, may be interrupted if flush is forced while
              // compaction is in progress
              compactor.startCompaction();
            }
      

      So every write of cell will produce the check checkActiveSize(). When we are at border of in mem flush, many threads doing writes to this memstore can get this checkActiveSize () to pass. Yes the AtomicBoolean is still false only. It is turned ON after some time once the new thread is started run and it push the active to pipeline etc.
      In the new thread code of inMemFlush, we dont have any size check. It just takes the active segment and pushes that to pipeline. Yes we dont allow any new writes to memstore at this time. But before that write lock on region, other handler thread also might have added entry to this thread pool. When the 1st one finishes, it releases the lock on region and handler threads trying for write to memstore, might get lock and add some data. Now this 2nd in mem flush thread may get a chance and get the lock and so it just takes current active segment and flush that in memory ! This will produce very small sized segments to pipeline.

      Attachments

        1. HBASE-16162_V2.patch
          17 kB
          Anoop Sam John
        2. HBASE-16162_V3.patch
          3 kB
          Anoop Sam John
        3. HBASE-16162_V4.patch
          6 kB
          Anoop Sam John
        4. HBASE-16162.patch
          2 kB
          Anoop Sam John

        Issue Links

          Activity

            People

              anoop.hbase Anoop Sam John
              anoop.hbase Anoop Sam John
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: