[HBASE-16931] Setting cell's seqId to zero in compaction flow might cause RS down. - ASF JIRA

Lijin Bin added a comment - 24/Oct/16 04:16

2016-10-21 20:43:41,348 ERROR [regionserver/11.251.219.114:16020-shortCompactions-1477046565275] ipc.RpcServer: Thread[regionserver/11.251.219.114:16020-shortCompactions-1477046565275,5,main] throw uncaught exception
java.lang.AssertionError: Key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=0/seqid=0 followed by a smaller key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=4/seqid=11993975 in cf a
        at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:794)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:552)
        at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:287)
        at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:111)
        at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
        at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1238)
        at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1862)
        at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:519)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
2016-10-21 20:43:41,348 INFO  [regionserver/11.251.219.114:16020-shortCompactions-1477046565275] ipc.RpcServer: Trigger server abort since we cannot serve requests anymore
2016-10-21 20:43:41,348 FATAL [regionserver/11.251.219.114:16020-shortCompactions-1477046565275] regionserver.HRegionServer: ABORTING region server 11.251.219.114,16020,1477046487349: Thread[regionserver/11.251.219.114:16020-shortCompactions-1477046565275,5,main] throw uncaught exception
java.lang.AssertionError: Key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=0/seqid=0 followed by a smaller key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=4/seqid=11993975 in cf a
        at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:794)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:552)
        at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:287)
        at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:111)
        at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
        at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1238)
        at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1862)
        at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:519)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

Lijin Bin added a comment - 24/Oct/16 04:16 2016-10-21 20:43:41,348 ERROR [regionserver/11.251.219.114:16020-shortCompactions-1477046565275] ipc.RpcServer: Thread [regionserver/11.251.219.114:16020-shortCompactions-1477046565275,5,main] throw uncaught exception java.lang.AssertionError: Key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=0/seqid=0 followed by a smaller key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=4/seqid=11993975 in cf a at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:794) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:552) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:287) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:111) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1238) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1862) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:519) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:744) 2016-10-21 20:43:41,348 INFO [regionserver/11.251.219.114:16020-shortCompactions-1477046565275] ipc.RpcServer: Trigger server abort since we cannot serve requests anymore 2016-10-21 20:43:41,348 FATAL [regionserver/11.251.219.114:16020-shortCompactions-1477046565275] regionserver.HRegionServer: ABORTING region server 11.251.219.114,16020,1477046487349: Thread [regionserver/11.251.219.114:16020-shortCompactions-1477046565275,5,main] throw uncaught exception java.lang.AssertionError: Key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=0/seqid=0 followed by a smaller key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=4/seqid=11993975 in cf a at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:794) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:552) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:287) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:111) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1238) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1862) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:519) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:744)

Lijin Bin added a comment - 24/Oct/16 04:18

export HBASE_OPTS="-ea -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC"
enable assertions on production.

Lijin Bin added a comment - 24/Oct/16 04:18 export HBASE_OPTS="-ea -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC" enable assertions on production.

Lijin Bin added a comment - 24/Oct/16 04:24

When i check the hfile's data there is “K: 067:532053748281/a:priceTrend/1476280811000/Put/vlen=4/seqid=11993975 V: 39.0” in a hfile, and “K: 067:532053748281/a:priceTrend/1476280811000/Put/vlen=0/seqid=12076597 V:“ in another hfile, so actually there is no cell "Key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=0/seqid=0" in hfile.

Lijin Bin added a comment - 24/Oct/16 04:24 When i check the hfile's data there is “K: 067:532053748281/a:priceTrend/1476280811000/Put/vlen=4/seqid=11993975 V: 39.0” in a hfile, and “K: 067:532053748281/a:priceTrend/1476280811000/Put/vlen=0/seqid=12076597 V:“ in another hfile, so actually there is no cell "Key 067:532053748281/a:priceTrend/1476280811000/Put/vlen=0/seqid=0" in hfile.

Lijin Bin added a comment - 24/Oct/16 04:25

The table VERSIONS => '65535'

Lijin Bin added a comment - 24/Oct/16 04:25 The table VERSIONS => '65535'

Lijin Bin added a comment - 24/Oct/16 04:27

And this compaction exception cause 100+ regionservers down in one of our production cluster.

Lijin Bin added a comment - 24/Oct/16 04:27 And this compaction exception cause 100+ regionservers down in one of our production cluster.

Anoop Sam John added a comment - 24/Oct/16 05:57

Oh.. nice debugging.. It would have been so hard for u debug this.
On that patch, so what u r doing now is change the seqId of the cell to zero and pass that time writer and out of the for loop, change back the seqId of the last cell to the original value. This also might fail in one case. You can see within the loop, there is a shipped() call now. We have use shipped to do in btw cleanup in readers. But the writers also implemented as ShipperListener and this will do a clone of the last cells it refer. This is needed because the when the blocks for compaction comes from a L2 shared memory cache, the shipped call might return back the block to BC and so that memory can get released. If the lastCells refer to this block area, our cell data area can get corrupted. So what we do is before return back of the blocks we will do a clone of the cells. (Note that only the needed part we will clone. FYI I can forsee a bug here also. We dont clone the seqId and all cloned cells will give seqId of 0!.. That is another issue to solve).. So said that when u come out of for loop, it might so happen that the prevCells that other part of code refer to is not the same as object within the cells list.
cc ram_krish

Anoop Sam John added a comment - 24/Oct/16 05:57 Oh.. nice debugging.. It would have been so hard for u debug this. On that patch, so what u r doing now is change the seqId of the cell to zero and pass that time writer and out of the for loop, change back the seqId of the last cell to the original value. This also might fail in one case. You can see within the loop, there is a shipped() call now. We have use shipped to do in btw cleanup in readers. But the writers also implemented as ShipperListener and this will do a clone of the last cells it refer. This is needed because the when the blocks for compaction comes from a L2 shared memory cache, the shipped call might return back the block to BC and so that memory can get released. If the lastCells refer to this block area, our cell data area can get corrupted. So what we do is before return back of the blocks we will do a clone of the cells. (Note that only the needed part we will clone. FYI I can forsee a bug here also. We dont clone the seqId and all cloned cells will give seqId of 0!.. That is another issue to solve).. So said that when u come out of for loop, it might so happen that the prevCells that other part of code refer to is not the same as object within the cells list. cc ram_krish

Anoop Sam John added a comment - 24/Oct/16 05:58

So the way to solve this is that we should be able to pass some info the writer on write call to put this cell's seqId as zero. And no need to call setSeqId on individual cells at all. wdyt?

Anoop Sam John added a comment - 24/Oct/16 05:58 So the way to solve this is that we should be able to pass some info the writer on write call to put this cell's seqId as zero. And no need to call setSeqId on individual cells at all. wdyt?

Anoop Sam John added a comment - 24/Oct/16 05:59

Is this only a 2.0 issue?

Anoop Sam John added a comment - 24/Oct/16 05:59 Is this only a 2.0 issue?

Anoop Sam John added a comment - 24/Oct/16 06:02

I just changed the subject of the jira and upped the priority to critical as this will be a very bad bug when occurs. Chance of this to occur may be rare. Still !!!

Anoop Sam John added a comment - 24/Oct/16 06:02 I just changed the subject of the jira and upped the priority to critical as this will be a very bad bug when occurs. Chance of this to occur may be rare. Still !!!

ramkrishna.s.vasudevan added a comment - 24/Oct/16 06:12

Checking this patch.
Fix looks ok but one question -since we call shipped() after a batch of cells are written - to avoid OOME because during compaction we hold all the blocks till the compaction is completed. So to avoid that we call shipped(). But because we do shipped() there is a chance that the blocks are cleared and in write flow we hold on to 'lastCell' etc so those could get corrupted when the block got released.
So we added beforeShipped() called. Now even before this bug was there in read path even in write path we will end up in the same problem right.
The lastCell in write path just before the cleanSeqId started happening will have a seqId but now the next Cell will become 0. So it is going to be problem in Writer#checkKey() method I believe.
One more question - after append() immediately cant we again set back the lastSeqId?

ramkrishna.s.vasudevan added a comment - 24/Oct/16 06:12 Checking this patch. Fix looks ok but one question -since we call shipped() after a batch of cells are written - to avoid OOME because during compaction we hold all the blocks till the compaction is completed. So to avoid that we call shipped(). But because we do shipped() there is a chance that the blocks are cleared and in write flow we hold on to 'lastCell' etc so those could get corrupted when the block got released. So we added beforeShipped() called. Now even before this bug was there in read path even in write path we will end up in the same problem right. The lastCell in write path just before the cleanSeqId started happening will have a seqId but now the next Cell will become 0. So it is going to be problem in Writer#checkKey() method I believe. One more question - after append() immediately cant we again set back the lastSeqId?

ramkrishna.s.vasudevan added a comment - 24/Oct/16 06:13

Its very difficult to debug this issue.

ramkrishna.s.vasudevan added a comment - 24/Oct/16 06:13 Its very difficult to debug this issue.

ramkrishna.s.vasudevan added a comment - 24/Oct/16 06:15 - edited

Ok. So just saw that in Writer

    if (lastCell != null) {
      int keyComp = comparator.compareKeyIgnoresMvcc(lastCell, cell);

      if (keyComp > 0) {
        throw new IOException("Added a key not lexically larger than"
            + " previous. Current cell = " + cell + ", lastCell = " + lastCell);
      } else if (keyComp == 0) {
        isDuplicateKey = true;
      }
    }

So we ignore the mvcc. So this issue will not happen in writer.

ramkrishna.s.vasudevan added a comment - 24/Oct/16 06:15 - edited Ok. So just saw that in Writer if (lastCell != null ) { int keyComp = comparator.compareKeyIgnoresMvcc(lastCell, cell); if (keyComp > 0) { throw new IOException( "Added a key not lexically larger than" + " previous. Current cell = " + cell + ", lastCell = " + lastCell); } else if (keyComp == 0) { isDuplicateKey = true ; } } So we ignore the mvcc. So this issue will not happen in writer.

Lijin Bin added a comment - 24/Oct/16 06:20

And if we set back sequence id before ((ShipperListener) writer).beforeShipped(), do this can fix this bug?

Lijin Bin added a comment - 24/Oct/16 06:20 And if we set back sequence id before ((ShipperListener) writer).beforeShipped(), do this can fix this bug?

ramkrishna.s.vasudevan added a comment - 24/Oct/16 06:23

I think it will. Instead of doing every time in the loop.

ramkrishna.s.vasudevan added a comment - 24/Oct/16 06:23 I think it will. Instead of doing every time in the loop.

Anoop Sam John added a comment - 24/Oct/16 06:24

U mean call that before calling beforeShipped also.. Ya seems that will work out. Am fine with that.. Pls add fat comments there what we are doing. And also in other places. This is very bad bug and this is the way we solve it.. ELse later people will wonder what non sense we are doing

Anoop Sam John added a comment - 24/Oct/16 06:24 U mean call that before calling beforeShipped also .. Ya seems that will work out. Am fine with that.. Pls add fat comments there what we are doing. And also in other places. This is very bad bug and this is the way we solve it.. ELse later people will wonder what non sense we are doing

Lijin Bin added a comment - 24/Oct/16 06:49

branch-1 also have this problem.

Lijin Bin added a comment - 24/Oct/16 06:49 branch-1 also have this problem.

Lijin Bin added a comment - 24/Oct/16 06:52

Yes, it is hard to debug...

Lijin Bin added a comment - 24/Oct/16 06:52 Yes, it is hard to debug...

Lijin Bin added a comment - 24/Oct/16 06:55 - edited

Yes，i miss to check the writer, thanks very much for the confirm.

Lijin Bin added a comment - 24/Oct/16 06:55 - edited Yes，i miss to check the writer, thanks very much for the confirm.

Lijin Bin added a comment - 24/Oct/16 07:02

Thanks very much, i need to fix it quickly for our production cluster.

Lijin Bin added a comment - 24/Oct/16 07:02 Thanks very much, i need to fix it quickly for our production cluster.

Anoop Sam John added a comment - 24/Oct/16 07:06

Sure.. Pls post a new patch for affected version. Once QA is ok, we can commit.

Anoop Sam John added a comment - 24/Oct/16 07:06 Sure.. Pls post a new patch for affected version. Once QA is ok, we can commit.

Anoop Sam John added a comment - 24/Oct/16 07:36

There is a small issue no? Within if we set lastcleancell and its seqId.. reset back based on lastcleanCell != null.. This might cause set the seqId on a wrong cell on which the set to 0 was not happened. We need to have an else and make lastCleanCell =null there no? We need a better name for this variable.

Anoop Sam John added a comment - 24/Oct/16 07:36 There is a small issue no? Within if we set lastcleancell and its seqId.. reset back based on lastcleanCell != null.. This might cause set the seqId on a wrong cell on which the set to 0 was not happened. We need to have an else and make lastCleanCell =null there no? We need a better name for this variable.

Lijin Bin added a comment - 24/Oct/16 08:08

What name do you think is better for this variable?

Lijin Bin added a comment - 24/Oct/16 08:08 What name do you think is better for this variable?

Yu Li added a comment - 24/Oct/16 08:09

IMO changing the seqid of a cell which is referred to is dangerous, clone the cell for writing out might be safer although more garbage will be generated.

And please move the test case into TestCompaction like I mentioned in our local review fella aoxiang

Yu Li added a comment - 24/Oct/16 08:09 IMO changing the seqid of a cell which is referred to is dangerous, clone the cell for writing out might be safer although more garbage will be generated. And please move the test case into TestCompaction like I mentioned in our local review fella aoxiang

Lijin Bin added a comment - 24/Oct/16 08:14

I do not think clone is a good idea.

Lijin Bin added a comment - 24/Oct/16 08:14 I do not think clone is a good idea.

Anoop Sam John added a comment - 24/Oct/16 08:20

I agree that avoid set and reset cell seqId is the best way. And clone is some thing not acceptable IMO. We should be able to tell write whether it need to use the seqId of the cell or zero. But that will be bigger change. So am fine with current way. Ya I was abt to ask. For one test do we need to spin a cluster. If possible avoid. But I like that u did a UT for this so that in future we wont have regression here. Seem V3 also dont have the change to reset the local variable to null.

Anoop Sam John added a comment - 24/Oct/16 08:20 I agree that avoid set and reset cell seqId is the best way. And clone is some thing not acceptable IMO. We should be able to tell write whether it need to use the seqId of the cell or zero. But that will be bigger change. So am fine with current way. Ya I was abt to ask. For one test do we need to spin a cluster. If possible avoid. But I like that u did a UT for this so that in future we wont have regression here. Seem V3 also dont have the change to reset the local variable to null.

Lijin Bin added a comment - 24/Oct/16 08:29

upload ~~HBASE-16931~~_master_v4.patch this patch reset the local variable to null.

Lijin Bin added a comment - 24/Oct/16 08:29 upload HBASE-16931 _master_v4.patch this patch reset the local variable to null.

Yu Li added a comment - 24/Oct/16 08:45

I agree that avoid set and reset cell seqId is the best way. And clone is some thing not acceptable IMO. We should be able to tell write whether it need to use the seqId of the cell or zero. But that will be bigger change. So am fine with current way.

Ok, we'll also use current version as a hot-fix online, so I've no much rights to argue here I guess (Smile).

Ya I was abt to ask. For one test do we need to spin a cluster. If possible avoid. But I like that u did a UT for this so that in future we wont have regression here.

Ya adding a UT case to cover a new problem is required on our side (Smile). And I still insist to move the case into TestCompaction guys.

Yu Li added a comment - 24/Oct/16 08:45 I agree that avoid set and reset cell seqId is the best way. And clone is some thing not acceptable IMO. We should be able to tell write whether it need to use the seqId of the cell or zero. But that will be bigger change. So am fine with current way. Ok, we'll also use current version as a hot-fix online, so I've no much rights to argue here I guess (Smile). Ya I was abt to ask. For one test do we need to spin a cluster. If possible avoid. But I like that u did a UT for this so that in future we wont have regression here. Ya adding a UT case to cover a new problem is required on our side (Smile). And I still insist to move the case into TestCompaction guys.

Ted Yu added a comment - 24/Oct/16 09:05

Patch v5 looks good to me.

Ted Yu added a comment - 24/Oct/16 09:05 Patch v5 looks good to me.

Yu Li added a comment - 24/Oct/16 09:27

+1 on patch v5, good job aoxiang

Yu Li added a comment - 24/Oct/16 09:27 +1 on patch v5, good job aoxiang

Anoop Sam John added a comment - 24/Oct/16 09:32

+1.

Anoop Sam John added a comment - 24/Oct/16 09:32 +1.

Hadoop QA added a comment - 24/Oct/16 09:44

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 17s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
+1	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
+1	mvninstall	3m 16s	master passed
+1	compile	0m 39s	master passed
+1	checkstyle	0m 50s	master passed
+1	mvneclipse	0m 15s	master passed
-1	findbugs	1m 52s	hbase-server in master has 1 extant Findbugs warnings.
+1	javadoc	0m 31s	master passed
+1	mvninstall	0m 49s	the patch passed
+1	compile	0m 37s	the patch passed
+1	javac	0m 37s	the patch passed
+1	checkstyle	0m 47s	the patch passed
+1	mvneclipse	0m 15s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	hadoopcheck	29m 17s	Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1.
+1	hbaseprotoc	0m 12s	the patch passed
+1	findbugs	1m 48s	the patch passed
+1	javadoc	0m 26s	the patch passed
-1	unit	83m 52s	hbase-server in the patch failed.
+1	asflicense	0m 15s	The patch does not generate ASF License warnings.
		126m 16s

Reason	Tests
Timed out junit tests	org.apache.hadoop.hbase.client.TestFromClientSide
	org.apache.hadoop.hbase.client.TestReplicaWithCluster
	org.apache.hadoop.hbase.client.TestHCM
	org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient

Subsystem	Report/Notes
Docker	Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12834889/HBASE-16931_master_v3.patch
JIRA Issue	~~HBASE-16931~~
Optional Tests	asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 5e03579f4917 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
git revision	master / 187ff19
Default Java	1.8.0_101
findbugs	v3.0.0
findbugs	https://builds.apache.org/job/PreCommit-HBASE-Build/4155/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
unit	https://builds.apache.org/job/PreCommit-HBASE-Build/4155/artifact/patchprocess/patch-unit-hbase-server.txt
unit test logs	https://builds.apache.org/job/PreCommit-HBASE-Build/4155/artifact/patchprocess/patch-unit-hbase-server.txt
Test Results	https://builds.apache.org/job/PreCommit-HBASE-Build/4155/testReport/
modules	C: hbase-server U: hbase-server
Console output	https://builds.apache.org/job/PreCommit-HBASE-Build/4155/console
Powered by	Apache Yetus 0.3.0 http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 24/Oct/16 09:44 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 3m 16s master passed +1 compile 0m 39s master passed +1 checkstyle 0m 50s master passed +1 mvneclipse 0m 15s master passed -1 findbugs 1m 52s hbase-server in master has 1 extant Findbugs warnings. +1 javadoc 0m 31s master passed +1 mvninstall 0m 49s the patch passed +1 compile 0m 37s the patch passed +1 javac 0m 37s the patch passed +1 checkstyle 0m 47s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 hadoopcheck 29m 17s Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. +1 hbaseprotoc 0m 12s the patch passed +1 findbugs 1m 48s the patch passed +1 javadoc 0m 26s the patch passed -1 unit 83m 52s hbase-server in the patch failed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 126m 16s Reason Tests Timed out junit tests org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.client.TestReplicaWithCluster org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient Subsystem Report/Notes Docker Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834889/HBASE-16931_master_v3.patch JIRA Issue HBASE-16931 Optional Tests asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile uname Linux 5e03579f4917 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh git revision master / 187ff19 Default Java 1.8.0_101 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HBASE-Build/4155/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html unit https://builds.apache.org/job/PreCommit-HBASE-Build/4155/artifact/patchprocess/patch-unit-hbase-server.txt unit test logs https://builds.apache.org/job/PreCommit-HBASE-Build/4155/artifact/patchprocess/patch-unit-hbase-server.txt Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/4155/testReport/ modules C: hbase-server U: hbase-server Console output https://builds.apache.org/job/PreCommit-HBASE-Build/4155/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.

Yu Li added a comment - 24/Oct/16 10:06

The failed cases are known as always timeout and irrelative to the patch here. Committing the patch with 3 binding +1s, thanks all for review anoop.hbase, ram_krish, tedyu!

Yu Li added a comment - 24/Oct/16 10:06 The failed cases are known as always timeout and irrelative to the patch here. Committing the patch with 3 binding +1s, thanks all for review anoop.hbase , ram_krish , tedyu !

Yu Li added a comment - 24/Oct/16 10:14

Oops, at a second glance, the previous HadoopQA report was about patch v3, will wait for the QA report for v5 and branch-1 before committing the patch.

Yu Li added a comment - 24/Oct/16 10:14 Oops, at a second glance, the previous HadoopQA report was about patch v3, will wait for the QA report for v5 and branch-1 before committing the patch.

Yu Li added a comment - 24/Oct/16 10:18

aoxiang Please update the branch-1 patch to merge UT case into TestCompaction, thanks.

Yu Li added a comment - 24/Oct/16 10:18 aoxiang Please update the branch-1 patch to merge UT case into TestCompaction , thanks.

Hadoop QA added a comment - 24/Oct/16 10:47

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 13s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
+1	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
+1	mvninstall	3m 21s	master passed
+1	compile	0m 36s	master passed
+1	checkstyle	0m 46s	master passed
+1	mvneclipse	0m 14s	master passed
-1	findbugs	1m 44s	hbase-server in master has 1 extant Findbugs warnings.
+1	javadoc	0m 26s	master passed
+1	mvninstall	0m 46s	the patch passed
+1	compile	0m 36s	the patch passed
+1	javac	0m 36s	the patch passed
+1	checkstyle	0m 45s	the patch passed
+1	mvneclipse	0m 14s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	hadoopcheck	27m 25s	Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1.
+1	hbaseprotoc	0m 12s	the patch passed
+1	findbugs	1m 49s	the patch passed
+1	javadoc	0m 25s	the patch passed
-1	unit	79m 19s	hbase-server in the patch failed.
+1	asflicense	0m 15s	The patch does not generate ASF License warnings.
		119m 26s

Reason	Tests
Timed out junit tests	org.apache.hadoop.hbase.master.procedure.TestTableDescriptorModificationFromClient
	org.apache.hadoop.hbase.master.procedure.TestDeleteColumnFamilyProcedure
	org.apache.hadoop.hbase.master.procedure.TestAddColumnFamilyProcedure

Subsystem	Report/Notes
Docker	Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12834902/HBASE-16931_master_v5.patch
JIRA Issue	~~HBASE-16931~~
Optional Tests	asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 06f275318878 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
git revision	master / 97cb1d7
Default Java	1.8.0_101
findbugs	v3.0.0
findbugs	https://builds.apache.org/job/PreCommit-HBASE-Build/4158/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
unit	https://builds.apache.org/job/PreCommit-HBASE-Build/4158/artifact/patchprocess/patch-unit-hbase-server.txt
unit test logs	https://builds.apache.org/job/PreCommit-HBASE-Build/4158/artifact/patchprocess/patch-unit-hbase-server.txt
Test Results	https://builds.apache.org/job/PreCommit-HBASE-Build/4158/testReport/
modules	C: hbase-server U: hbase-server
Console output	https://builds.apache.org/job/PreCommit-HBASE-Build/4158/console
Powered by	Apache Yetus 0.3.0 http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 24/Oct/16 10:47 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 3m 21s master passed +1 compile 0m 36s master passed +1 checkstyle 0m 46s master passed +1 mvneclipse 0m 14s master passed -1 findbugs 1m 44s hbase-server in master has 1 extant Findbugs warnings. +1 javadoc 0m 26s master passed +1 mvninstall 0m 46s the patch passed +1 compile 0m 36s the patch passed +1 javac 0m 36s the patch passed +1 checkstyle 0m 45s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 hadoopcheck 27m 25s Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. +1 hbaseprotoc 0m 12s the patch passed +1 findbugs 1m 49s the patch passed +1 javadoc 0m 25s the patch passed -1 unit 79m 19s hbase-server in the patch failed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 119m 26s Reason Tests Timed out junit tests org.apache.hadoop.hbase.master.procedure.TestTableDescriptorModificationFromClient org.apache.hadoop.hbase.master.procedure.TestDeleteColumnFamilyProcedure org.apache.hadoop.hbase.master.procedure.TestAddColumnFamilyProcedure Subsystem Report/Notes Docker Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834902/HBASE-16931_master_v5.patch JIRA Issue HBASE-16931 Optional Tests asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile uname Linux 06f275318878 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh git revision master / 97cb1d7 Default Java 1.8.0_101 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HBASE-Build/4158/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html unit https://builds.apache.org/job/PreCommit-HBASE-Build/4158/artifact/patchprocess/patch-unit-hbase-server.txt unit test logs https://builds.apache.org/job/PreCommit-HBASE-Build/4158/artifact/patchprocess/patch-unit-hbase-server.txt Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/4158/testReport/ modules C: hbase-server U: hbase-server Console output https://builds.apache.org/job/PreCommit-HBASE-Build/4158/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.

Hadoop QA added a comment - 24/Oct/16 10:52

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 17s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
+1	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
+1	mvninstall	3m 34s	master passed
+1	compile	0m 49s	master passed
+1	checkstyle	0m 54s	master passed
+1	mvneclipse	0m 17s	master passed
-1	findbugs	2m 16s	hbase-server in master has 1 extant Findbugs warnings.
+1	javadoc	0m 36s	master passed
+1	mvninstall	0m 55s	the patch passed
+1	compile	0m 45s	the patch passed
+1	javac	0m 45s	the patch passed
+1	checkstyle	0m 50s	the patch passed
+1	mvneclipse	0m 16s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	hadoopcheck	33m 45s	Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1.
+1	hbaseprotoc	0m 14s	the patch passed
+1	findbugs	2m 11s	the patch passed
+1	javadoc	0m 31s	the patch passed
-1	unit	84m 37s	hbase-server in the patch failed.
+1	asflicense	0m 15s	The patch does not generate ASF License warnings.
		133m 25s

Reason	Tests
Timed out junit tests	org.apache.hadoop.hbase.client.TestFromClientSide
	org.apache.hadoop.hbase.client.TestReplicaWithCluster
	org.apache.hadoop.hbase.client.TestHCM
	org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient
	org.apache.hadoop.hbase.client.TestMobSnapshotCloneIndependence

Subsystem	Report/Notes
Docker	Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12834900/HBASE-16931_master_v4.patch
JIRA Issue	~~HBASE-16931~~
Optional Tests	asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux f7fc783d6af3 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
git revision	master / 187ff19
Default Java	1.8.0_101
findbugs	v3.0.0
findbugs	https://builds.apache.org/job/PreCommit-HBASE-Build/4157/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
unit	https://builds.apache.org/job/PreCommit-HBASE-Build/4157/artifact/patchprocess/patch-unit-hbase-server.txt
unit test logs	https://builds.apache.org/job/PreCommit-HBASE-Build/4157/artifact/patchprocess/patch-unit-hbase-server.txt
Test Results	https://builds.apache.org/job/PreCommit-HBASE-Build/4157/testReport/
modules	C: hbase-server U: hbase-server
Console output	https://builds.apache.org/job/PreCommit-HBASE-Build/4157/console
Powered by	Apache Yetus 0.3.0 http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 24/Oct/16 10:52 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 3m 34s master passed +1 compile 0m 49s master passed +1 checkstyle 0m 54s master passed +1 mvneclipse 0m 17s master passed -1 findbugs 2m 16s hbase-server in master has 1 extant Findbugs warnings. +1 javadoc 0m 36s master passed +1 mvninstall 0m 55s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 50s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 hadoopcheck 33m 45s Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. +1 hbaseprotoc 0m 14s the patch passed +1 findbugs 2m 11s the patch passed +1 javadoc 0m 31s the patch passed -1 unit 84m 37s hbase-server in the patch failed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 133m 25s Reason Tests Timed out junit tests org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.client.TestReplicaWithCluster org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient org.apache.hadoop.hbase.client.TestMobSnapshotCloneIndependence Subsystem Report/Notes Docker Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834900/HBASE-16931_master_v4.patch JIRA Issue HBASE-16931 Optional Tests asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile uname Linux f7fc783d6af3 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh git revision master / 187ff19 Default Java 1.8.0_101 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HBASE-Build/4157/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html unit https://builds.apache.org/job/PreCommit-HBASE-Build/4157/artifact/patchprocess/patch-unit-hbase-server.txt unit test logs https://builds.apache.org/job/PreCommit-HBASE-Build/4157/artifact/patchprocess/patch-unit-hbase-server.txt Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/4157/testReport/ modules C: hbase-server U: hbase-server Console output https://builds.apache.org/job/PreCommit-HBASE-Build/4157/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.

Lijin Bin added a comment - 24/Oct/16 12:59

done

Lijin Bin added a comment - 24/Oct/16 12:59 done

Yu Li added a comment - 24/Oct/16 14:12

Ok, let's wait for the HadoopQA report.

Yu Li added a comment - 24/Oct/16 14:12 Ok, let's wait for the HadoopQA report.

Yu Li added a comment - 24/Oct/16 14:18

Checking the UT result and all 3 cases failed because of OOME and irrelative to patch here.

Running org.apache.hadoop.hbase.master.procedure.TestTableDescriptorModificationFromClient
Running org.apache.hadoop.hbase.master.procedure.TestDeleteColumnFamilyProcedure
Exception in thread "Thread-2421" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-2415"
Running org.apache.hadoop.hbase.master.procedure.TestAddColumnFamilyProcedure
Exception in thread "Thread-2415" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:3332)
	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:569)
	at java.lang.StringBuffer.append(StringBuffer.java:369)
	at java.io.BufferedReader.readLine(BufferedReader.java:358)
	at java.io.BufferedReader.readLine(BufferedReader.java:389)
	at org.apache.maven.surefire.shade.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:66)
Exception in thread "Thread-2419" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-2423" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-2425" java.lang.OutOfMemoryError: Java heap space

And confirmed all could pass locally.

Yu Li added a comment - 24/Oct/16 14:18 Checking the UT result and all 3 cases failed because of OOME and irrelative to patch here. Running org.apache.hadoop.hbase.master.procedure.TestTableDescriptorModificationFromClient Running org.apache.hadoop.hbase.master.procedure.TestDeleteColumnFamilyProcedure Exception in thread "Thread-2421" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-2415" Running org.apache.hadoop.hbase.master.procedure.TestAddColumnFamilyProcedure Exception in thread "Thread-2415" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:569) at java.lang.StringBuffer.append(StringBuffer.java:369) at java.io.BufferedReader.readLine(BufferedReader.java:358) at java.io.BufferedReader.readLine(BufferedReader.java:389) at org.apache.maven.surefire.shade.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:66) Exception in thread "Thread-2419" java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-2423" java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-2425" java.lang.OutOfMemoryError: Java heap space And confirmed all could pass locally.

Hadoop QA added a comment - 24/Oct/16 14:55

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 18s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
+1	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
+1	mvninstall	2m 3s	branch-1 passed
+1	compile	0m 32s	branch-1 passed with JDK v1.8.0_101
+1	compile	0m 35s	branch-1 passed with JDK v1.7.0_80
+1	checkstyle	0m 55s	branch-1 passed
+1	mvneclipse	0m 17s	branch-1 passed
-1	findbugs	1m 55s	hbase-server in branch-1 has 2 extant Findbugs warnings.
+1	javadoc	0m 25s	branch-1 passed with JDK v1.8.0_101
+1	javadoc	0m 33s	branch-1 passed with JDK v1.7.0_80
+1	mvninstall	0m 47s	the patch passed
+1	compile	0m 31s	the patch passed with JDK v1.8.0_101
+1	javac	0m 31s	the patch passed
+1	compile	0m 34s	the patch passed with JDK v1.7.0_80
+1	javac	0m 34s	the patch passed
+1	checkstyle	0m 56s	the patch passed
+1	mvneclipse	0m 16s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	hadoopcheck	16m 51s	The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1.
+1	hbaseprotoc	0m 16s	the patch passed
+1	findbugs	2m 8s	the patch passed
+1	javadoc	0m 23s	the patch passed with JDK v1.8.0_101
+1	javadoc	0m 33s	the patch passed with JDK v1.7.0_80
-1	unit	84m 29s	hbase-server in the patch failed.
+1	asflicense	0m 17s	The patch does not generate ASF License warnings.
		116m 4s

Reason	Tests
Failed junit tests	hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat
Timed out junit tests	org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster
	org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd
	org.apache.hadoop.hbase.tool.TestCanaryTool

Subsystem	Report/Notes
Docker	Client=1.11.2 Server=1.11.2 Image:yetus/hbase:b2c5d84
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12834926/HBASE-16931.branch-1.v2.patch
JIRA Issue	~~HBASE-16931~~
Optional Tests	asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile
uname	Linux 59828eb69aaf 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/testptch/patchprocess/precommit/personality/hbase.sh
git revision	branch-1 / d76cc4c
Default Java	1.7.0_80
Multi-JDK versions	/usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-oracle:1.7.0_80
findbugs	v3.0.0
findbugs	https://builds.apache.org/job/PreCommit-HBASE-Build/4161/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
unit	https://builds.apache.org/job/PreCommit-HBASE-Build/4161/artifact/patchprocess/patch-unit-hbase-server.txt
unit test logs	https://builds.apache.org/job/PreCommit-HBASE-Build/4161/artifact/patchprocess/patch-unit-hbase-server.txt
Test Results	https://builds.apache.org/job/PreCommit-HBASE-Build/4161/testReport/
modules	C: hbase-server U: hbase-server
Console output	https://builds.apache.org/job/PreCommit-HBASE-Build/4161/console
Powered by	Apache Yetus 0.3.0 http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 24/Oct/16 14:55 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 2m 3s branch-1 passed +1 compile 0m 32s branch-1 passed with JDK v1.8.0_101 +1 compile 0m 35s branch-1 passed with JDK v1.7.0_80 +1 checkstyle 0m 55s branch-1 passed +1 mvneclipse 0m 17s branch-1 passed -1 findbugs 1m 55s hbase-server in branch-1 has 2 extant Findbugs warnings. +1 javadoc 0m 25s branch-1 passed with JDK v1.8.0_101 +1 javadoc 0m 33s branch-1 passed with JDK v1.7.0_80 +1 mvninstall 0m 47s the patch passed +1 compile 0m 31s the patch passed with JDK v1.8.0_101 +1 javac 0m 31s the patch passed +1 compile 0m 34s the patch passed with JDK v1.7.0_80 +1 javac 0m 34s the patch passed +1 checkstyle 0m 56s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 hadoopcheck 16m 51s The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. +1 hbaseprotoc 0m 16s the patch passed +1 findbugs 2m 8s the patch passed +1 javadoc 0m 23s the patch passed with JDK v1.8.0_101 +1 javadoc 0m 33s the patch passed with JDK v1.7.0_80 -1 unit 84m 29s hbase-server in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 116m 4s Reason Tests Failed junit tests hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat Timed out junit tests org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd org.apache.hadoop.hbase.tool.TestCanaryTool Subsystem Report/Notes Docker Client=1.11.2 Server=1.11.2 Image:yetus/hbase:b2c5d84 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834926/HBASE-16931.branch-1.v2.patch JIRA Issue HBASE-16931 Optional Tests asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile uname Linux 59828eb69aaf 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/patchprocess/precommit/personality/hbase.sh git revision branch-1 / d76cc4c Default Java 1.7.0_80 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-oracle:1.7.0_80 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HBASE-Build/4161/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html unit https://builds.apache.org/job/PreCommit-HBASE-Build/4161/artifact/patchprocess/patch-unit-hbase-server.txt unit test logs https://builds.apache.org/job/PreCommit-HBASE-Build/4161/artifact/patchprocess/patch-unit-hbase-server.txt Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/4161/testReport/ modules C: hbase-server U: hbase-server Console output https://builds.apache.org/job/PreCommit-HBASE-Build/4161/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.

Yu Li added a comment - 24/Oct/16 15:30

All timed out cases failed because of OOME (why so frequent OOME?...)

Running org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd
Exception in thread "Thread-2475" java.lang.OutOfMemoryError: Java heap space
Running org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster
Running org.apache.hadoop.hbase.tool.TestCanaryTool
Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-2505" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-2507" java.lang.OutOfMemoryError: Java heap space

And the failed case seems encountered some environment issue (Unable to create region directory):

Running org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 23.592 sec <<< FAILURE! - in org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat
testScanYZYToEmpty(org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat)  Time elapsed: 0.044 sec  <<< ERROR!
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Unable to create region directory: /tmp/scantest1_snapshot__8235bb48-4e7b-4e00-ad80-b2ce716c8522/data/default/scantest1/519e450e89d832d702a416a9bca04b5d
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
	at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180)
	at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:527)
	at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:234)
	at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:170)
	at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:736)
	at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshot(MultiTableSnapshotInputFormatImpl.java:249)
	at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshots(MultiTableSnapshotInputFormatImpl.java:243)
	at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.setInput(MultiTableSnapshotInputFormatImpl.java:80)
	at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat.setInput(MultiTableSnapshotInputFormat.java:106)
	at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initMultiTableSnapshotMapperJob(TableMapReduceUtil.java:319)
	at org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat.initJob(TestMultiTableSnapshotInputFormat.java:72)

Ran above 4 cases locally and confirmed all could pass.

Yu Li added a comment - 24/Oct/16 15:30 All timed out cases failed because of OOME (why so frequent OOME?...) Running org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd Exception in thread "Thread-2475" java.lang.OutOfMemoryError: Java heap space Running org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster Running org.apache.hadoop.hbase.tool.TestCanaryTool Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-2505" java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-2507" java.lang.OutOfMemoryError: Java heap space And the failed case seems encountered some environment issue ( Unable to create region directory ): Running org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 23.592 sec <<< FAILURE! - in org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat testScanYZYToEmpty(org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat) Time elapsed: 0.044 sec <<< ERROR! java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Unable to create region directory: /tmp/scantest1_snapshot__8235bb48-4e7b-4e00-ad80-b2ce716c8522/data/default/scantest1/519e450e89d832d702a416a9bca04b5d at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:527) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:234) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:170) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:736) at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshot(MultiTableSnapshotInputFormatImpl.java:249) at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshots(MultiTableSnapshotInputFormatImpl.java:243) at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.setInput(MultiTableSnapshotInputFormatImpl.java:80) at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat.setInput(MultiTableSnapshotInputFormat.java:106) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initMultiTableSnapshotMapperJob(TableMapReduceUtil.java:319) at org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat.initJob(TestMultiTableSnapshotInputFormat.java:72) Ran above 4 cases locally and confirmed all could pass.

Yu Li added a comment - 24/Oct/16 15:41

Pushed commit to master branch and branch-1.

AFAICS all branch-1.1+ have this problem, and since this is a critical issue, please let me know your thoughts on whether to have this one in branch-1.1/1.2/1.3 gentlemen ndimiduk busbey stack mantonov, thanks.

Yu Li added a comment - 24/Oct/16 15:41 Pushed commit to master branch and branch-1. AFAICS all branch-1.1+ have this problem, and since this is a critical issue, please let me know your thoughts on whether to have this one in branch-1.1/1.2/1.3 gentlemen ndimiduk busbey stack mantonov , thanks.

Michael Stack added a comment - 24/Oct/16 16:17

aoxiang Excellent debugging sir.

Michael Stack added a comment - 24/Oct/16 16:17 aoxiang Excellent debugging sir.

Michael Stack added a comment - 24/Oct/16 16:25

Should we do another version of the patch after the hot fix has gone in? It looks like we get the sequenceid twice from a Cell. We could cache it since calc costs. What does this: "...so if cleanSeqId is called when the scanner.next call "? Where does it come from? Can we block it for ongoing scans? Like carp84 I like the idea of cloning rather than resetting the sequenceid on a Cell (but appreciate too the Ram and Anoop conservation). beforeShipped is making copies of Cells? Pardon if the questions are dumb. Just looking at the patch w/o spending time garnering more context.

Michael Stack added a comment - 24/Oct/16 16:25 Should we do another version of the patch after the hot fix has gone in? It looks like we get the sequenceid twice from a Cell. We could cache it since calc costs. What does this: "...so if cleanSeqId is called when the scanner.next call "? Where does it come from? Can we block it for ongoing scans? Like carp84 I like the idea of cloning rather than resetting the sequenceid on a Cell (but appreciate too the Ram and Anoop conservation). beforeShipped is making copies of Cells? Pardon if the questions are dumb. Just looking at the patch w/o spending time garnering more context.

Sean Busbey added a comment - 24/Oct/16 16:43

If the problem also exists in the earlier branch-1 release lines then we should probably fix it there. Does this patch cause some change in behavior or other downside that would give us pause on backporting it to all branches?

Sean Busbey added a comment - 24/Oct/16 16:43 If the problem also exists in the earlier branch-1 release lines then we should probably fix it there. Does this patch cause some change in behavior or other downside that would give us pause on backporting it to all branches?

Anoop Sam John added a comment - 24/Oct/16 17:56

It looks like we get the sequenceid twice from a Cell

Not really needed.. getSeqId return a state of the cell. So no real calc there.

Anoop Sam John added a comment - 24/Oct/16 17:56 It looks like we get the sequenceid twice from a Cell Not really needed.. getSeqId return a state of the cell. So no real calc there.

Anoop Sam John added a comment - 24/Oct/16 17:57

Does this patch cause some change in behavior or other downside that would give us pause on backporting it to all branches?

No I dont think so.. +1 to fix this in all applicable places now.

Anoop Sam John added a comment - 24/Oct/16 17:57 Does this patch cause some change in behavior or other downside that would give us pause on backporting it to all branches? No I dont think so.. +1 to fix this in all applicable places now.

Hudson added a comment - 24/Oct/16 18:05

FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1846 (See https://builds.apache.org/job/HBase-Trunk_matrix/1846/)
~~HBASE-16931~~ Setting cell's seqId to zero in compaction flow might cause (liyu: rev fa3cbd1d80ab414901a650e41679534578ebcf63)

(edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
(edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

Hudson added a comment - 24/Oct/16 18:05 FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1846 (See https://builds.apache.org/job/HBase-Trunk_matrix/1846/ ) HBASE-16931 Setting cell's seqId to zero in compaction flow might cause (liyu: rev fa3cbd1d80ab414901a650e41679534578ebcf63) (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

Hudson added a comment - 24/Oct/16 18:11

FAILURE: Integrated in Jenkins build HBase-1.4 #493 (See https://builds.apache.org/job/HBase-1.4/493/)
~~HBASE-16931~~ Setting cell's seqId to zero in compaction flow might cause (liyu: rev 16823ff55e7da4822d82fb2a2108b4a253fd42f9)

(edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
(edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java

Hudson added a comment - 24/Oct/16 18:11 FAILURE: Integrated in Jenkins build HBase-1.4 #493 (See https://builds.apache.org/job/HBase-1.4/493/ ) HBASE-16931 Setting cell's seqId to zero in compaction flow might cause (liyu: rev 16823ff55e7da4822d82fb2a2108b4a253fd42f9) (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java

Yu Li added a comment - 25/Oct/16 06:50

Thanks busbey stack and anoop.hbase for the confirmation, will commit into branch-1.1/1.2 soon if no objections.

For branch-1.3, mantonov I know you're trying to stabilize the 1.3.0 release sir, so a special ping before committing this, please let me know your thoughts. Thanks.

Yu Li added a comment - 25/Oct/16 06:50 Thanks busbey stack and anoop.hbase for the confirmation, will commit into branch-1.1/1.2 soon if no objections. For branch-1.3, mantonov I know you're trying to stabilize the 1.3.0 release sir, so a special ping before committing this, please let me know your thoughts. Thanks.

Yu Li added a comment - 25/Oct/16 13:25

Pushed into branch-1.1 and branch-1.2. Now waiting for the decision on branch-1.3 to close the JIRA.

Yu Li added a comment - 25/Oct/16 13:25 Pushed into branch-1.1 and branch-1.2. Now waiting for the decision on branch-1.3 to close the JIRA.

Sean Busbey added a comment - 25/Oct/16 14:00

The choice is between this going into 1.3.0 or waiting for 1.3.1, right?

Sean Busbey added a comment - 25/Oct/16 14:00 The choice is between this going into 1.3.0 or waiting for 1.3.1, right?

Hudson added a comment - 25/Oct/16 15:14

FAILURE: Integrated in Jenkins build HBase-1.1-JDK8 #1888 (See https://builds.apache.org/job/HBase-1.1-JDK8/1888/)
~~HBASE-16931~~ Setting cell's seqId to zero in compaction flow might cause (liyu: rev f4cbf77f28a2668743b1c73b5aeb113c3456fb13)

(edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
(edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

Hudson added a comment - 25/Oct/16 15:14 FAILURE: Integrated in Jenkins build HBase-1.1-JDK8 #1888 (See https://builds.apache.org/job/HBase-1.1-JDK8/1888/ ) HBASE-16931 Setting cell's seqId to zero in compaction flow might cause (liyu: rev f4cbf77f28a2668743b1c73b5aeb113c3456fb13) (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

Hudson added a comment - 25/Oct/16 15:47

FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1804 (See https://builds.apache.org/job/HBase-1.1-JDK7/1804/)
~~HBASE-16931~~ Setting cell's seqId to zero in compaction flow might cause (liyu: rev f4cbf77f28a2668743b1c73b5aeb113c3456fb13)

(edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
(edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java

Hudson added a comment - 25/Oct/16 15:47 FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1804 (See https://builds.apache.org/job/HBase-1.1-JDK7/1804/ ) HBASE-16931 Setting cell's seqId to zero in compaction flow might cause (liyu: rev f4cbf77f28a2668743b1c73b5aeb113c3456fb13) (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java

Hudson added a comment - 25/Oct/16 17:33

SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #50 (See https://builds.apache.org/job/HBase-1.2-JDK8/50/)
~~HBASE-16931~~ Setting cell's seqId to zero in compaction flow might cause (liyu: rev 7f10379b1e6756002993c2638c866a6537a088c4)

(edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
(edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java

Hudson added a comment - 25/Oct/16 17:33 SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #50 (See https://builds.apache.org/job/HBase-1.2-JDK8/50/ ) HBASE-16931 Setting cell's seqId to zero in compaction flow might cause (liyu: rev 7f10379b1e6756002993c2638c866a6537a088c4) (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java

Hudson added a comment - 25/Oct/16 18:19

SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #55 (See https://builds.apache.org/job/HBase-1.2-JDK7/55/)
~~HBASE-16931~~ Setting cell's seqId to zero in compaction flow might cause (liyu: rev 7f10379b1e6756002993c2638c866a6537a088c4)

(edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
(edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java

Hudson added a comment - 25/Oct/16 18:19 SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #55 (See https://builds.apache.org/job/HBase-1.2-JDK7/55/ ) HBASE-16931 Setting cell's seqId to zero in compaction flow might cause (liyu: rev 7f10379b1e6756002993c2638c866a6537a088c4) (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java

Yu Li added a comment - 26/Oct/16 03:24

The choice is between this going into 1.3.0 or waiting for 1.3.1, right?

Yes sir. busbey

Yu Li added a comment - 26/Oct/16 03:24 The choice is between this going into 1.3.0 or waiting for 1.3.1, right? Yes sir. busbey

Yu Li added a comment - 26/Oct/16 03:31

Skimmed log of failed post-commit check for trunk/branch-1/branch-1.1 and confirmed all caused by known fake case or timed out, no regression caused by change here, FWIW.

Yu Li added a comment - 26/Oct/16 03:31 Skimmed log of failed post-commit check for trunk/branch-1/branch-1.1 and confirmed all caused by known fake case or timed out, no regression caused by change here, FWIW.

Yu Li added a comment - 27/Oct/16 04:21

Ping mantonov, pending on your decision for branch-1.3 to close this JIRA boss.

Yu Li added a comment - 27/Oct/16 04:21 Ping mantonov , pending on your decision for branch-1.3 to close this JIRA boss.

Yu Li added a comment - 28/Oct/16 02:33

Plan to commit to branch-1.3 if no objections in another 12 hours and close this JIRA, thanks.

Yu Li added a comment - 28/Oct/16 02:33 Plan to commit to branch-1.3 if no objections in another 12 hours and close this JIRA, thanks.

Mikhail Antonov added a comment - 31/Oct/16 09:40

anoopsamjohn for that other possible bug in the clone..is there a jira for that to track? Would it also affect the same set of branches?

Mikhail Antonov added a comment - 31/Oct/16 09:40 anoopsamjohn for that other possible bug in the clone..is there a jira for that to track? Would it also affect the same set of branches?

Mikhail Antonov added a comment - 31/Oct/16 09:55

Sorry for late reply here carp84 and great investigation

This bug seems bad enough to get it in 1.3.0.. As I'm not very familiar with this code, let me just have another look Monday morning (tomorrow, PST) and we should be able to cherry pick to 1.3.. cc ghelmling who has been debugging compactions recently.

(Any more details on how to trigger/repro it? Some specific workload?)

Mikhail Antonov added a comment - 31/Oct/16 09:55 Sorry for late reply here carp84 and great investigation This bug seems bad enough to get it in 1.3.0.. As I'm not very familiar with this code, let me just have another look Monday morning (tomorrow, PST) and we should be able to cherry pick to 1.3.. cc ghelmling who has been debugging compactions recently. (Any more details on how to trigger/repro it? Some specific workload?)

Yu Li added a comment - 31/Oct/16 10:20 - edited

No problem, totally understand what it costs and how busy that will be to maintain some production online cluster meanwhile, will wait for your feedback.

And yes aoxiang has added a UT case in his patch to reproduce the issue, please refer to that for more details.

Yu Li added a comment - 31/Oct/16 10:20 - edited No problem, totally understand what it costs and how busy that will be to maintain some production online cluster meanwhile, will wait for your feedback. And yes aoxiang has added a UT case in his patch to reproduce the issue, please refer to that for more details.

Anoop Sam John added a comment - 01/Nov/16 10:33

We dont clone the seqId and all cloned cells will give seqId of 0!.

Seems there is no issue. Only this method affects the mentioned area

public static KeyValue toNewKeyCell(final Cell cell) {
    byte[] bytes = new byte[keyLength(cell)];
    appendKeyTo(cell, bytes, 0);
    KeyValue kv = new KeyValue.KeyOnlyKeyValue(bytes, 0, bytes.length);
    // Set the seq id. The new key cell could be used in comparisons so it
    // is important that it uses the seqid also. If not the comparsion would fail
    kv.setSequenceId(cell.getSequenceId());
    return kv;
  }

And we are copying the seqId also. So all good

Anoop Sam John added a comment - 01/Nov/16 10:33 We dont clone the seqId and all cloned cells will give seqId of 0!. Seems there is no issue. Only this method affects the mentioned area public static KeyValue toNewKeyCell( final Cell cell) { byte [] bytes = new byte [keyLength(cell)]; appendKeyTo(cell, bytes, 0); KeyValue kv = new KeyValue.KeyOnlyKeyValue(bytes, 0, bytes.length); // Set the seq id. The new key cell could be used in comparisons so it // is important that it uses the seqid also. If not the comparsion would fail kv.setSequenceId(cell.getSequenceId()); return kv; } And we are copying the seqId also. So all good

Gary Helmling added a comment - 01/Nov/16 23:33

+1 for 1.3. Let's pull it in.

Gary Helmling added a comment - 01/Nov/16 23:33 +1 for 1.3. Let's pull it in.

Mikhail Antonov added a comment - 01/Nov/16 23:37

Yeah I don't think we've seen that in the workloads we ran on 1.3, but yeah, let's pull it in 1.3. +1. Thanks for the patience here carp84!

Mikhail Antonov added a comment - 01/Nov/16 23:37 Yeah I don't think we've seen that in the workloads we ran on 1.3, but yeah, let's pull it in 1.3. +1. Thanks for the patience here carp84 !

Yu Li added a comment - 02/Nov/16 03:38

Ok, just pushed into 1.3, thanks for the confirmation mantonov and ghelmling.

Yu Li added a comment - 02/Nov/16 03:38 Ok, just pushed into 1.3, thanks for the confirmation mantonov and ghelmling .

Yu Li added a comment - 02/Nov/16 03:41

Closing this one since fix pushed to all 1.1+ branches. Again thanks for the good work aoxiang and thanks all for review.

Yu Li added a comment - 02/Nov/16 03:41 Closing this one since fix pushed to all 1.1+ branches. Again thanks for the good work aoxiang and thanks all for review.

Mikhail Antonov added a comment - 02/Nov/16 03:43

affectVersion should also be all 1.1+ branches?

Mikhail Antonov added a comment - 02/Nov/16 03:43 affectVersion should also be all 1.1+ branches?

Yu Li added a comment - 02/Nov/16 03:46

Yes, let me update, thanks for the note sir.

Yu Li added a comment - 02/Nov/16 03:46 Yes, let me update, thanks for the note sir.

Hudson added a comment - 02/Nov/16 10:11

FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #64 (See https://builds.apache.org/job/HBase-1.3-JDK8/64/)
~~HBASE-16931~~ Setting cell's seqId to zero in compaction flow might cause (liyu: rev 84481361b618f9f8ca12a6671eac2bb6ba894ea1)

(edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java
(edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

Hudson added a comment - 02/Nov/16 10:11 FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #64 (See https://builds.apache.org/job/HBase-1.3-JDK8/64/ ) HBASE-16931 Setting cell's seqId to zero in compaction flow might cause (liyu: rev 84481361b618f9f8ca12a6671eac2bb6ba894ea1) (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

Nick Dimiduk added a comment - 18/Dec/16 21:03

Bulk closing issues after 1.1.8 release.

Nick Dimiduk added a comment - 18/Dec/16 21:03 Bulk closing issues after 1.1.8 release.

HBase

Setting cell's seqId to zero in compaction flow might cause RS down.

Details

Description

Attachments

Attachments

Activity

People

Dates