Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
Description
During a minor compaction, a rename from *.rf_tmp to *.rf fails. This would be OK, except that we left a reference to *.rf in the !METADATA table. We need to make sure that if any part of the compaction fails we properly roll back to a good state. Could this be an opportunity for a FATE operation?
20 15:03:17,033 [tabletserver.Tablet] WARN : tserver:servername Tablet !0;~;!0< failed to rename /table_info/00790_00002.rf after MinC, will retry in 60 secs... java.io.IOException: Call to servername/10.20.30.40:9000 failed on local exception: java.io.IOException: Too many open files at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy0.rename(Unknown Source) at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.rename(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:556) at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:211) at cloudbase.server.tabletserver.Tablet$DatafileManager.bringMinorCompactionOnline(Tablet.java:748) at cloudbase.server.tabletserver.Tablet.minorCompact(Tablet.java:1999) at cloudbase.server.tabletserver.Tablet.access$3800(Tablet.java:123) at cloudbase.server.tabletserver.Tablet$MinorCompactionTask.run(Tablet.java:2070) at cloudbase.core.util.LoggingRunnable.run(LoggingRunnable.java:18) at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:31) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Too many open files at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:69) at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:52) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) 20 15:04:17,641 [tabletserver.Tablet] WARN : tserver:servername Target map file already exist /accumulo/tables/!0/table_info/00790_00002.rf 20 15:04:17,897 [tabletserver.FileManager] ERROR: tserver:servername Failed to open file /accumulo/tables/!0/table_info/00790_00002.rf File does not exist: /accumulo/tables/!0/table_info/00790_00002.rf