Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
The reproduction steps around this are a little bit fuzzy but basically we ran a moderate workload against a 1.6.0 server. Encryption happened to be turned on but that doesn't seem to be germane to the problem. After doing a moderate amount of work, Accumulo is refusing to start up, spewing this error over and over to the log:
2013-12-10 10:23:02,529 [tserver.TabletServer] WARN : exception while doing multi-scan java.lang.RuntimeException: java.io.IOException: Failed to open hdfs://10.10.1.115:9000/accumulo/tables/!0/table_info/A000042x.rf at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:1125) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Failed to open hdfs://10.10.1.115:9000/accumulo/tables/!0/table_info/A000042x.rf at org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:333) at org.apache.accumulo.tserver.FileManager.access$500(FileManager.java:58) at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:478) at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFileRefs(FileManager.java:466) at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:486) at org.apache.accumulo.tserver.Tablet$ScanDataSource.createIterator(Tablet.java:2027) at org.apache.accumulo.tserver.Tablet$ScanDataSource.iterator(Tablet.java:1989) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163) at org.apache.accumulo.tserver.Tablet.lookup(Tablet.java:1565) at org.apache.accumulo.tserver.Tablet.lookup(Tablet.java:1672) at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:1114) ... 6 more Caused by: java.io.FileNotFoundException: File does not exist: /accumulo/tables/!0/table_info/A000042x.rf at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2006) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1975) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1967) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:735) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436) at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:256) at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$000(CachableBlockFile.java:143) at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:212) at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313) at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:367) at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:143) at org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:825) at org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79) at org.apache.accumulo.core.file.DispatchingFileFactory.openReader(FileOperations.java:119) at org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:314) ... 16 more
Here's some other pieces of context:
HDFS contents:
ubuntu@ip-10-10-1-115:/data0/logs/accumulo$ hadoop fs -lsr /accumulo/tables/ drwxr-xr-x - accumulo hadoop 0 2013-12-10 00:32 /accumulo/tables/!0 drwxr-xr-x - accumulo hadoop 0 2013-12-10 01:06 /accumulo/tables/!0/default_tablet drwxr-xr-x - accumulo hadoop 0 2013-12-10 10:49 /accumulo/tables/!0/table_info -rw-r--r-- 5 accumulo hadoop 1698 2013-12-10 00:34 /accumulo/tables/!0/table_info/F0000000.rf -rw-r--r-- 5 accumulo hadoop 43524 2013-12-10 01:53 /accumulo/tables/!0/table_info/F000062q.rf drwxr-xr-x - accumulo hadoop 0 2013-12-10 00:32 /accumulo/tables/+r drwxr-xr-x - accumulo hadoop 0 2013-12-10 10:45 /accumulo/tables/+r/root_tablet -rw-r--r-- 5 accumulo hadoop 2070 2013-12-10 10:45 /accumulo/tables/+r/root_tablet/A0000738.rf drwxr-xr-x - accumulo hadoop 0 2013-12-10 00:33 /accumulo/tables/1 drwxr-xr-x - accumulo hadoop 0 2013-12-10 00:33 /accumulo/tables/1/default_tablet
ZooKeeper entries
[zk: localhost:2181(CONNECTED) 6] get /accumulo/371cfa3e-fe96-4a50-92e9-da7572589ffa/root_tablet/dir hdfs://10.10.1.115:9000/accumulo/tables/+r/root_tablet cZxid = 0x1b ctime = Tue Dec 10 00:32:56 EST 2013 mZxid = 0x1b mtime = Tue Dec 10 00:32:56 EST 2013 pZxid = 0x1b cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 54 numChildren = 0
I'm going to preserve the state of this machine in HDFS for a while but not forever, so if there are other pieces of context people need, let me know.
Attachments
Attachments
Issue Links
- is related to
-
ACCUMULO-2668 slow WAL writes
- Resolved
- relates to
-
ACCUMULO-1481 Root tablet in its own table
- Resolved