Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Cannot Reproduce
-
1.5.1, 1.6.0
-
None
-
None
Description
In trying to understand what's happening in ACCUMULO-2964, I noticed that I had similar exceptions from two different threads. One of the threads starting working after the unexplained thrift exceptions from a tserver restart, and the other continued to repeatedly fail for the lifetime of the test.
I repeatedly saw this exception:
2014-07-11 04:14:41,591 [replication.WorkMaker] WARN : Failed to write work mutations for replication, will retry org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations : 0 security codes: {accumulo.metadata(ID:!0)=[DEFAULT_SECURITY_ERROR]} # server errors 0 # exceptions 0 at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:537) at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.addMutation(TabletServerBatchWriter.java:249) at org.apache.accumulo.core.client.impl.BatchWriterImpl.addMutation(BatchWriterImpl.java:45) at org.apache.accumulo.master.replication.WorkMaker.addWorkRecord(WorkMaker.java:184) at org.apache.accumulo.master.replication.WorkMaker.run(WorkMaker.java:124) at org.apache.accumulo.master.replication.ReplicationDriver.run(ReplicationDriver.java:91)
The part that struck me as odd was that the BatchWriter wasn't against the metadata table, but the replication table.
I looked into the TabletServerBatchWriter. It appears that once the client sees a MutationsRejectedException, that BatchWriter becomes useless as the internal member somethingFailed is never reset back to false after the failure is reported. Same goes for serverSideErrors, unknownErrors, lastUnknownErrors, too.
If this is the case, this is a bug because the BatchWriter should be resilient in this regard and not force the client to create a new Instance. If that's infeasible to do, we should add exceptions to the BatchWriter that fail fast when a BatchWriter is used that will report repeatedly report the same failure over and over again.
Attachments
Issue Links
- is related to
-
ACCUMULO-4154 Improve batch writer
- Resolved
- relates to
-
ACCUMULO-3092 BatchWriter does not notice when tserver fails, continues to send mutations to it
- Resolved