Uploaded image for project: 'Apache Apex Core'
  1. Apache Apex Core
  2. APEXCORE-703

Window processing timeout for finished/undeployed container

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 3.6.0
    • None
    • None

    Description

      Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first container, id #1, finishes and gets undeployed at 12:41:10 PM.

      Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked because no data has been received for 60s, declares failure, and restarts it.

      This would seem to be a bug – shouldn't finished and undeployed operators be deregistered from the timeout logic that is detecting stuck operators?

      Log below

      Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer processHeartbeatResponse
      INFO: Undeploy request: [1]
      Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer undeploy
      INFO: Undeploy complete.
      Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager updateRecoveryCheckpoints
      WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked committed window ffffffffffffffff, recovery window ffffffffffffffff, current time 1492198930012, last window id change time 1492198869957, window processing timeout millis 60000
      Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager updateCheckpoints
      INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
      Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer processHeartbeatResponse
      INFO: Received shutdown request
      Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
      INFO: Container container-6 restart.
      Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager scheduleContainerRestart
      INFO: Initiating recovery for container-6@localhost
      Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager updateRecoveryCheckpoints
      WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked committed window ffffffffffffffff, recovery window ffffffffffffffff, current time 1492198931015, last window id change time 1492198869957, window processing timeout millis 60000
      

      Attachments

        Issue Links

          Activity

            People

              vrozov Vlad Rozov
              dhalperi Dan Halperin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: