Accumulo
  1. Accumulo
  2. ACCUMULO-2219

parallelize the operation of certain FATE operations

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.4, 1.5.0
    • Fix Version/s: None
    • Component/s: fate
    • Labels:
      None

      Description

      As in ACCUMULO-2217, a user operation can cause the FATE processor to get stuck and require administrative action to make progress on any future FATE operations. We should look at ways to parallelize the execution of FATE tasks that commute and don't interfere with each other. Maybe there are some rules we can use to run certain well-known operations in parallel (like a merge on one table at the same time as a deletion of another table, for example).

      This has a strong impact on multi-tenancy, preventing one user's operations from hosing all the other users.

        Activity

        Hide
        Keith Turner added a comment -

        Yeah, need more info. Its possible the 2nd fate operation was stuck
        waiting for something to happen on a tablet server. The output of the fate
        admin command and a jstack of the master would be a good start.

        Show
        Keith Turner added a comment - Yeah, need more info. Its possible the 2nd fate operation was stuck waiting for something to happen on a tablet server. The output of the fate admin command and a jstack of the master would be a good start.
        Hide
        Adam Fuchs added a comment -

        The case that I saw was when the first operation was a table deletion (that was stuck due to minor compaction failure), and the second operation was a range deletion of a different table. I didn't analyze the locking using fate admin print, but I did notice that the second operation was blocked. Any idea which lock they would have been contending for, or do I need to try to replicate this?

        Show
        Adam Fuchs added a comment - The case that I saw was when the first operation was a table deletion (that was stuck due to minor compaction failure), and the second operation was a range deletion of a different table. I didn't analyze the locking using fate admin print, but I did notice that the second operation was blocked. Any idea which lock they would have been contending for, or do I need to try to replicate this?
        Hide
        Keith Turner added a comment -

        Fate operations are executed in a thread pool. The size in configurable by setting master.fate.threadpool.size, it defaults to 4. Operations that do not lock the same table should be able to execute in parallel. If an operation tries to lock a table that already locked, it should yield and not tie up the fate thread. When you say things are backed up, do you know more about what was going on? Did you happen to look the output of the fate admin print command? It shows what operations have locks and are waiting to get locks.

        Show
        Keith Turner added a comment - Fate operations are executed in a thread pool. The size in configurable by setting master.fate.threadpool.size, it defaults to 4. Operations that do not lock the same table should be able to execute in parallel. If an operation tries to lock a table that already locked, it should yield and not tie up the fate thread. When you say things are backed up, do you know more about what was going on? Did you happen to look the output of the fate admin print command? It shows what operations have locks and are waiting to get locks.

          People

          • Assignee:
            Unassigned
            Reporter:
            Adam Fuchs
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development