Apache S4
  1. Apache S4
  2. S4-87

Checkpointing: recovery : avoid rejections upon fetching

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.5.0
    • Labels:
      None

      Description

      Tests pass fine on macosx with jdk 1.6.0_33 but fail on ubuntu with the same jdk version (oracle).

      Here is the stacktrace: (I added some logging to see the error)

      java.util.concurrent.RejectedExecutionException: null
      	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768) ~[na:1.6.0_33]
      	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) ~[na:1.6.0_33]
      	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) ~[na:1.6.0_33]
      	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) ~[na:1.6.0_33]
      	at org.apache.s4.core.ft.SafeKeeper.fetchSerializedState(SafeKeeper.java:239) ~[main/:na]
      	at org.apache.s4.core.ProcessingElement.recover(ProcessingElement.java:759) [main/:na]
      	at org.apache.s4.core.ProcessingElement.handleInputEvent(ProcessingElement.java:411) [main/:na]
      	at org.apache.s4.core.Stream.run(Stream.java:299) [main/:na]
      	at java.lang.Thread.run(Thread.java:662) [na:1.6.0_33]
       [words seen stream] ERROR org.apache.s4.core.ProcessingElement - Cannot fetch serialized stated for [org.apache.s4.wordcount.WordCounterPE/doobie
      

      This could be due to the fact that we use a handoff queue, though it is not clear to me.

      Anyway, since there may be parallel recovery request from different prototypes, it may be more adequate to use a bounded queue, with the possibility to use multiple threads for the fetch operations.

        Activity

        Hide
        Matthieu Morel added a comment -

        Patch available in branch S4-87

        Show
        Matthieu Morel added a comment - Patch available in branch S4-87
        Hide
        Daniel Gómez Ferro added a comment -

        I managed to reproduce it with a fetch task that times out, so subsequent tasks are rejected.

        The proposed patch fixes it, +1

        Show
        Daniel Gómez Ferro added a comment - I managed to reproduce it with a fetch task that times out, so subsequent tasks are rejected. The proposed patch fixes it, +1
        Hide
        Matthieu Morel added a comment -

        Thanks for checking Daniel.

        Merged into piper commit 43a31f44040de424ff6a6baafd2ea3983357df3d

        Note that there are certainly optimizations we can make here but we are leaving this for a next release, probably 0.6.

        Show
        Matthieu Morel added a comment - Thanks for checking Daniel. Merged into piper commit 43a31f44040de424ff6a6baafd2ea3983357df3d Note that there are certainly optimizations we can make here but we are leaving this for a next release, probably 0.6.

          People

          • Assignee:
            Matthieu Morel
            Reporter:
            Matthieu Morel
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development