Description
While running the random walk test on 10 node cluster the security random walk test failed.
Caused by: org.apache.accumulo.core.client.AccumuloSecurityException: Error TABLE_DOESNT_EXIST - Unknown security exception at org.apache.accumulo.core.client.admin.SecurityOperationsImpl.execute(SecurityOperationsImpl.java:70) at org.apache.accumulo.core.client.admin.SecurityOperationsImpl.hasTablePermission(SecurityOperationsImpl.java:269) at org.apache.accumulo.server.test.randomwalk.security.AlterTablePerm.alter(AlterTablePerm.java:81)
The test was trying to check permissions on a table, and got an error saying that the table did not exist. Looking at the master logs it seems like the table was created about 40ms before the check. The hasTablePermission code chooses a random tablet server to do the check. I suspect the zoo cache on the random tablet server was not yet updated. Many places in the 1.4 code have the pattern that if something fails, then clear the cache and retry. The code that threw the table not found exception does not do this, but needs to. org.apache.accumulo.server.client.ClientServiceHandler.checkTableId() or something it calls should clear the cache and retry when it thinks the table does not exist.