Details
Description
The security random walk test failed to drop a table.
19 05:56:04,122 [randomwalk.Framework] ERROR: Error during random walk java.lang.Exception: Error running node Security.xml at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:253) at org.apache.accumulo.server.test.randomwalk.Framework.run(Framework.java:61) at org.apache.accumulo.server.test.randomwalk.Framework.main(Framework.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.accumulo.start.Main$1.run(Main.java:89) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.Exception: Error running node security.TableOp at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:253) at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:249) ... 8 more Caused by: org.apache.accumulo.core.client.AccumuloException: Internal error processing waitForTableOperation at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:293) at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:261) at org.apache.accumulo.core.client.admin.TableOperationsImpl.delete(TableOperationsImpl.java:508) at org.apache.accumulo.server.test.randomwalk.security.DropTable.dropTable(DropTable.java:66) at org.apache.accumulo.server.test.randomwalk.security.TableOp.visit(TableOp.java:204) at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:249) ... 9 more Caused by: org.apache.thrift.TApplicationException: Internal error processing waitForTableOperation at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForTableOperation(MasterClientService.java:684) at org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForTableOperation(MasterClientService.java:665) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83) at $Proxy1.waitForTableOperation(Unknown Source) at org.apache.accumulo.core.client.admin.TableOperationsImpl.waitForTableOperation(TableOperationsImpl.java:233) at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:275) ... 14 more
Looking in the master logs see the following indicating a scan failed while trying to delete the table.
19 05:56:04,098 [thrift.MasterClientService$Processor] ERROR: Internal error processing waitForTableOperation java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server xxx.xxx.xxx.xxx:9997 at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:188) at org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:97) at org.apache.accumulo.server.master.tableOps.CleanUp.isReady(DeleteTable.java:94) at org.apache.accumulo.server.master.tableOps.CleanUp.isReady(DeleteTable.java:55) at org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50) at org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server xxx.xxx.xxx.xxx:9997 at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:579) at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:335) at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more Caused by: org.apache.thrift.TApplicationException: Internal error processing startMultiScan at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:298) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:274) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83) at $Proxy4.startMultiScan(Unknown Source) at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:539) ... 5 more
Looking on the tablet server where the scan failed, see the following
19 05:56:04,062 [tabletserver.Tablet] DEBUG: initiateClose(saveState=true queueMinC=false disableWrites=false) !0;~;!0< 19 05:56:04,062 [tabletserver.Tablet] DEBUG: completeClose(saveState=true completeClose=true) !0;~;!0< 19 05:56:04,063 [problems.ProblemReports] DEBUG: Filing problem report !0 FILE_READ /accumulo/tables/!0/table_info/F0002xng.rf 19 05:56:04,072 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< closed 19 05:56:04,072 [tabletserver.TabletServer] DEBUG: Unassigning !0;~;!0<@(null,192.168.117.13:9997[134d7425fc5940d],null) 19 05:56:04,073 [tabletserver.TabletServer] WARN : lookup failed for tablet !0;~;!0< java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.accumulo.core.file.rfile.RelativeKey.fastSkip(RelativeKey.java:249) at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:677) at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:614) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37) at org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:125) at org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1020) at org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:94) at org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:105) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:67) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.user.VersioningIterator.seek(VersioningIterator.java:89) at org.apache.accumulo.core.iterators.user.WholeRowIterator.seek(WholeRowIterator.java:240) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:116) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:168) at org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1607) at org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1710) at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:988) at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 19 05:56:04,076 [tabletserver.TabletServer] INFO : unloaded !0;~;!0< 19 05:56:04,076 [tabletserver.TabletServer] INFO : told to unload tablet that was not being served !0;~;!0< 19 05:56:04,084 [thrift.TabletClientService$Processor] ERROR: Internal error processing startMultiScan java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.continueMultiScan(TabletServer.java:1321) at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.startMultiScan(TabletServer.java:1272) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:58) at $Proxy2.startMultiScan(Unknown Source) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.process(TabletClientService.java:2164) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2037) at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:151) at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631) at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:199) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException at org.apache.accumulo.server.tabletserver.TabletServer$ScanTask.get(TabletServer.java:661) at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.continueMultiScan(TabletServer.java:1308) ... 15 more Caused by: java.lang.RuntimeException: java.io.EOFException at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:999) at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.accumulo.core.file.rfile.RelativeKey.fastSkip(RelativeKey.java:249) at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:677) at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:614) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37) at org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:125) at org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1020) at org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:94) at org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:105) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:67) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.user.VersioningIterator.seek(VersioningIterator.java:89) at org.apache.accumulo.core.iterators.user.WholeRowIterator.seek(WholeRowIterator.java:240) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92) at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:116) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:168) at org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1607) at org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1710) at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:988) ... 4 more 19 05:56:04,120 [tabletserver.TabletServer] INFO : Loading tablet !0;~;!0< 19 05:56:04,128 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< opened
The metadata tablet were being frequently loaded and unloaded as descried in ACCUMULO-329 when this happened.
The file in question was major compacted away on another tablet server. Shortly after this.
19 05:57:54,187 [tabletserver.Tablet] DEBUG: Starting MajC !0;~;!0< [/table_info/A0002xm4.rf, /table_info/C0002xz3.rf, /table_info/C0002y0d.rf, /table_info/F0002xng.rf, /table_info/M0002xzv.rf, /table_info/M0002y6z.rf] --> /table_info/A0002y0e.rf_tmp 19 05:57:54,309 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< MajC [/table_info/A0002xm4.rf, /table_info/C0002xz3.rf, /table_info/C0002y0d.rf, /table_info/F0002xng.rf, /table_info/M0002xzv.rf, /table_info/M0002y6z.rf] --> /table_info/A0002y0e.rf
19 06:10:29,546 [gc.SimpleGarbageCollector] DEBUG: Deleting /accumulo/tables/!0/table_info/F0002xng.rf
The file was garbage collected later. The GC and Majc show that the file existed while the EOF happened. So this is probably either a bug in RFile fast skip or a bug related to the tablet closing. Probably related to the close w/ the close temporal relationship.