-
Type:
Sub-task
-
Status: Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: Performance
-
Labels:None
-
Hadoop Flags:Reviewed
Flushing memstore is taking too long. It looks like we are doing a bunch of comparing out of a new facility in hbase2, the Segment scanner at flush time.
Below is a patch from Anoop Sam John. I had a similar more hacky version. Both undo the extra comparing we were seeing in perf tests.
Anastasia Braginsky and Eshcar Hillel. Need your help please.
As I read it, we are trying to flush the memstore snapshot (default, no IMC case). There is only ever going to be one Segment involved (even if IMC is enabled); the snapshot Segment. But the getScanners is returning a list (of one) Scanners and the scan is via the generic SegmentScanner which is all about a bunch of stuff we don't need when doing a flush so it seems to do more work than is necessary. It also supports scanning backwards which is not needed when trying to flush memstore.
Do you see a problem doing a version of Anoops patch (whether IMC or not)? It makes a big difference in general throughput when the below patch is in place. Thanks.
diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java index cbd60e5da3..c3dd972254 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSnapshot.java @@ -40,7 +40,8 @@ public class MemStoreSnapshot implements Closeable { this.cellsCount = snapshot.getCellsCount(); this.memStoreSize = snapshot.getMemStoreSize(); this.timeRangeTracker = snapshot.getTimeRangeTracker(); - this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE); + //this.scanners = snapshot.getScanners(Long.MAX_VALUE, Long.MAX_VALUE); + this.scanners = snapshot.getScannersForSnapshot(); this.tagsPresent = snapshot.isTagsPresent(); } diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java index 70074bf3b4..279c4e50c8 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java @@ -33,6 +33,7 @@ import org.apache.hadoop.hbase.KeyValueUtil; import org.apache.hadoop.hbase.io.TimeRange; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.util.ClassSize; +import org.apache.hadoop.hbase.util.CollectionBackedScanner; import org.apache.yetus.audience.InterfaceAudience; import org.slf4j.Logger; import org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting; @@ -130,6 +131,10 @@ public abstract class Segment { return Collections.singletonList(new SegmentScanner(this, readPoint, order)); } + public List<KeyValueScanner> getScannersForSnapshot() { + return Collections.singletonList(new CollectionBackedScanner(this.cellSet.get(), comparator)); + } + /** * @return whether the segment has any cells */
- is related to
-
HBASE-20483 [PERFORMANCE] Flushing is 2x slower in hbase2.
-
- Resolved
-
- links to