Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
HDP-2.5.3
-
Reviewed
Description
At first, TruncateTableProcedure failed to write some files to HDFS in TRUNCATE_TABLE_CREATE_FS_LAYOUT state for some reason.
2018-05-15 08:00:25,346 WARN [ProcedureExecutorThread-8] procedure.TruncateTableProcedure: Retriable error trying to truncate table=<namespace>:<table> state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
java.io.IOException: java.util.concurrent.ExecutionException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hbase/data/.tmp/data/<namespace>/<table>/<region>/.regioninfo could only be replicated to 0 nodes instead of minReplication (=1). There are <the number of DNs> datanode(s) running and no node(s) are excluded in this operation.
...
But at this time, seemed like writing some files to HDFS was successful.
And then, TruncateTableProcedure was stuck in retry loop in TRUNCATE_TABLE_CREATE_FS_LAYOUT state. At this point, the following log messages were shown repeatedly in the master log:
2018-05-15 08:00:25,463 WARN [ProcedureExecutorThread-8] procedure.TruncateTableProcedure: Retriable error trying to truncate table=<namespace>:<table> state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: The specified region already exists on disk: hdfs://<name>/apps/hbase/data/.tmp/data/<namespace>/<table>/<region>
...
It seems like this is because TruncateTableProcedure tried to write the files that were written successfully in the first try.
I think we need to delete all the files and directories that are written successfully in the previous try before retrying the TRUNCATE_TABLE_CREATE_FS_LAYOUT state.
Actually, this issue was observed in HDP-2.5.3, but I think the upstream has the same issue. Also, it looks to me that CreateTableProcedure has a similar issue.