Description
Using the 4.14.3 client, it still seems the IndexFailurePolicy is still kicking in, which disables the index on write failure. This means that while the index is in 'disabled' state, writes to the data table can happen without any writes to the index table. While in theory this might be ok since the rebuilder should eventually kick in and rebuild from the disable_timestamp, this breaks the new indexing design invariant that there should be no data table rows without a corresponding index row (potentially unverified), so this could potentially cause some unexpected behavior.
Steps to repro:
1) Create data table
2) Create index table
3) "close_region" on index region from hbase shell
4) Upsert to data table
Eventually after some number of retries, the index will get disabled, which means any other client can write to the data table without writing to the index table.
Attachments
Attachments
- PHOENIX-5515.master.001.patch
- 39 kB
- Kadir OZDEMIR
- PHOENIX-5515.master.002.patch
- 40 kB
- Kadir OZDEMIR
- PHOENIX-5515.master.addendum.patch
- 2 kB
- Kadir OZDEMIR
Issue Links
- links to
Activity
Good find.
Patch looks good. Do we need to consider the other forms of indexes here (local and transactional)? Probably not, but might be good to confirm.
larsh, The new and old design coexist and share the same index committer (TrackingParallelWriterIndexCommitter). The old design uses the HBase connections of type INDEX_WRITER_CONNECTION_WITH_CUSTOM_THREADS_NO_RETRIES for index write RPCs and returns MultiIndexWriteFailureException with disableIndexOnFailure = true to the client when an index write fails. The Phoenix client then retries index writes by simply going through the rebuild path where data writes are skipped but index writes are replayed. The new design does index write retries on the server side by using the connections of INDEX_WRITER_CONNECTION_WITH_CUSTOM_THREADS. Before this Jira, the new designed returned the same exception back with disableIndexOnFailure = false (expecting that this will be honored by the Phoenix client). It turned that it is ignored. Thus, the solution here is to wrap MultiIndexWriteFailureException in DoNotRetryIOException so that the client does NOT see this as an index failure and does not retry and not change index state. This actually simplified the code a bit.
I have tested the patch on a real cluster and verified it. I am waiting for +1 to commit. gjacoby has reviewed the first patch and gave some minor comments, please see GitHub Pull Request #598. I have updated PR based on his comments.
SUCCESS: Integrated in Jenkins build Phoenix-4.x-HBase-1.5 #158 (See https://builds.apache.org/job/Phoenix-4.x-HBase-1.5/158/)
PHOENIX-5515 Able to write indexed value to data table without writing (kadir: rev 3356b5f9a504086329e9b8af878383878de80f3d)
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/TrackingParallelWriterIndexCommitter.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
- (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/index/GlobalIndexCheckerIT.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexWriter.java
- (edit) phoenix-core/src/test/java/org/apache/phoenix/hbase/index/write/TestIndexWriter.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexCommitter.java
FAILURE: Integrated in Jenkins build Phoenix-4.x-HBase-1.3 #561 (See https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/561/)
PHOENIX-5515 Able to write indexed value to data table without writing (kadir: rev 6d70c9c3d9c9014f55dea66ee7da5a92d1044b44)
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexWriter.java
- (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/index/GlobalIndexCheckerIT.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/TrackingParallelWriterIndexCommitter.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexCommitter.java
- (edit) phoenix-core/src/test/java/org/apache/phoenix/hbase/index/write/TestIndexWriter.java
SUCCESS: Integrated in Jenkins build Phoenix-4.x-HBase-1.4 #277 (See https://builds.apache.org/job/Phoenix-4.x-HBase-1.4/277/)
PHOENIX-5515 Able to write indexed value to data table without writing (kadir: rev ee42a51c02f3252f2091b2912a005f234f4580f1)
- (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/index/GlobalIndexCheckerIT.java
- (edit) phoenix-core/src/test/java/org/apache/phoenix/hbase/index/write/TestIndexWriter.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/TrackingParallelWriterIndexCommitter.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexWriter.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexCommitter.java
FAILURE: Integrated in Jenkins build PreCommit-PHOENIX-Build #3018 (See https://builds.apache.org/job/PreCommit-PHOENIX-Build/3018/)
PHOENIX-5515 Able to write indexed value to data table without writing (kadir: rev f431f1be1f3c17dba75adfce0873848c983af82c)
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/TrackingParallelWriterIndexCommitter.java
- (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/index/GlobalIndexCheckerIT.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexWriter.java
- (edit) phoenix-core/src/test/java/org/apache/phoenix/hbase/index/write/TestIndexWriter.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexCommitter.java
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
I see the data table regionserver getting killed on an index write failure after this patch:
2019-10-14 13:39:26,042 FATAL [RpcServer.FifoWFPBQ.default.handler=255,queue=21,port=16201] regionserver.HRegionServer: ABORTING region server vincentpoon-wsm.internal.salesforce.com,16201,1571085425169: The coprocessor org.apache.phoenix.hbase.index.IndexRegionObserver threw java.lang.IllegalMonitorStateException
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.apache.phoenix.hbase.index.LockManager$RowLockImpl.release(LockManager.java:227)
at org.apache.phoenix.hbase.index.IndexRegionObserver.postBatchMutateIndispensably(IndexRegionObserver.java:708)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1052)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1693)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1771)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1727)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1048)
at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3536)
at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3045)
at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2987)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:914)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:842)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2396)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35080)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
vincentpoon, great catch! The region server was killed due to releasing a lock that has been released already. I handled this case for the testing path but forgot to move it to common path between testing and production. The reason I have not seen this is that I covered this case in my IT tests, and did my real world testing on a cluster with 10 nodes where I did not notice that one region server failed and recovered. The patch is attached. I verified the patch.
+1
(Would have been if a test had caught this issue; I assume it's quite hard to capture this in a test.)
FAILURE: Integrated in Jenkins build PreCommit-PHOENIX-Build #3023 (See https://builds.apache.org/job/PreCommit-PHOENIX-Build/3023/)
PHOENIX-5515 Able to write indexed value to data table without writing (kadir: rev eebccf59a2a19ff2d3b328b113117c7e6685e75e)
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
FAILURE: Integrated in Jenkins build Phoenix-4.x-HBase-1.3 #564 (See https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/564/)
PHOENIX-5515 Able to write indexed value to data table without writing (kadir: rev aaa0b8ee39323daabdf8aa6eb2216feb07875e58)
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
SUCCESS: Integrated in Jenkins build Phoenix-4.x-HBase-1.4 #280 (See https://builds.apache.org/job/Phoenix-4.x-HBase-1.4/280/)
PHOENIX-5515 Able to write indexed value to data table without writing (kadir: rev 1dd77072567722390ab9a6d0a76a492ea05d7540)
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
SUCCESS: Integrated in Jenkins build Phoenix-4.x-HBase-1.5 #162 (See https://builds.apache.org/job/Phoenix-4.x-HBase-1.5/162/)
PHOENIX-5515 Able to write indexed value to data table without writing (kadir: rev fd61f6a019bf34beb4180a287ef8bc114111d119)
- (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12982823/PHOENIX-5515.master.001.patch
against master branch at commit bf30a40006e15b60eb140c974dc4317e4e83de75.
ATTACHMENT ID: 12982823
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 4 new or modified tests.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 lineLengths. The patch introduces the following lines longer than 100:
+ // Configure IndexRegionObserver to fail the last write phase (i.e., the post index update phase) where the verify flag is set
+ // to true and/or index rows are deleted and check that this does not impact the correctness
+ // The index rows are actually not deleted yet because IndexRegionObserver failed delete operation. However, they are
+ // This DML will scan the Index table and detect unverified index rows. This will trigger read repair which
+ // result in deleting these rows since the corresponding data table rows are deleted already. So, the number of
+ // rows to be deleted by the "DELETE" DML will be zero since the rows deleted by read repair will not be visible
+ // Configure IndexRegionObserver to fail the first write phase (i.e., the pre index update phase). This should not
+ conn.createStatement().execute("upsert into " + dataTableName + " (id, val2) values ('a', 'abcc')");
+ conn.createStatement().execute("upsert into " + dataTableName + " (id, val1, val2) values ('c', 'cd','cde')");
+ populateTable(dataTableName); // with two rows ('a', 'ab', 'abc', 'abcd') and ('b', 'bc', 'bcd', 'bcde')
-1 core tests. The patch failed these unit tests:
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.join.SubqueryUsingSortMergeJoinIT
Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/3014//testReport/
Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/3014//console
This message is automatically generated.