Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4398

Possible for client to see TableNotFoundException adding splits on a newly created table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.7.1
    • None
    • client, zookeeper
    • None

    Description

      Just came across a really odd scenario. I believe that it's a race condition in the client that stems from our beloved ZooCache.

      This was observed via a test failure in LogicalTimeIT:

      Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 29.249 sec <<< FAILURE! - in org.apache.accumulo.test.functional.LogicalTimeIT
      run(org.apache.accumulo.test.functional.LogicalTimeIT)  Time elapsed: 29.037 sec  <<< ERROR!
      org.apache.accumulo.core.client.TableNotFoundException: Table LogicalTimeIT_run06 does not exist
      	at org.apache.accumulo.core.client.impl.Tables._getTableId(Tables.java:117)
      	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:102)
      	at org.apache.accumulo.core.client.impl.TableOperationsImpl.addSplits(TableOperationsImpl.java:374)
      	at org.apache.accumulo.test.functional.LogicalTimeIT.runMergeTest(LogicalTimeIT.java:81)
      	at org.apache.accumulo.test.functional.LogicalTimeIT.run(LogicalTimeIT.java:56)
      

      Ultimately:

          conn.tableOperations().create(table, new NewTableConfiguration().setTimeType(TimeType.LOGICAL));
          TreeSet<Text> splitSet = new TreeSet<Text>();
          for (String split : splits) {
            splitSet.add(new Text(split));
          }
          conn.tableOperations().addSplits(table, splitSet);
      

      The important piece to remember is that a ZooKeeper client, when a watcher is set, will eventually get all updates from that watcher in the order which they occurred. LogicalTimeIT is repeatedly running the same test over tables of varying characteristics. I think these are the important points.

      Consider the following:

      1. Client creates a table T1
      2. ZooCache is cleared after FATE op finishes
      3. Watcher is set on ZTABLES in ZK
      4. Client interacts with T1
      5. Client creates T2
      6. ZooCache is cleared after FATE op finishes
      7. Watcher fires on ZTABLES node in ZK (CHILDREN_CHANGED) and repopulates the child list on the ZTABLES node
      8. Client makes call to split T2
      9. Code will check if the table exists, but the childrenCache will be repopulated in ZooCache which will cause the client to think the table doesn't exit
      10. Eventually, the watcher would fire and ZTABLES would be updated and everything is fine.

      I believe this is a plausible scenario, however perhaps unlikely.

      Attachments

        Activity

          People

            Unassigned Unassigned
            elserj Josh Elser
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: