Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.5.0
-
None
-
None
Description
Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first container, id #1, finishes and gets undeployed at 12:41:10 PM.
Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked because no data has been received for 60s, declares failure, and restarts it.
This would seem to be a bug – shouldn't finished and undeployed operators be deregistered from the timeout logic that is detecting stuck operators?
Log below
Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer processHeartbeatResponse INFO: Undeploy request: [1] Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer undeploy INFO: Undeploy complete. Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager updateRecoveryCheckpoints WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked committed window ffffffffffffffff, recovery window ffffffffffffffff, current time 1492198930012, last window id change time 1492198869957, window processing timeout millis 60000 Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager updateCheckpoints INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container PTContainer[id=1(container-6),state=ACTIVE] time 60055ms Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer processHeartbeatResponse INFO: Received shutdown request Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run INFO: Container container-6 restart. Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager scheduleContainerRestart INFO: Initiating recovery for container-6@localhost Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager updateRecoveryCheckpoints WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked committed window ffffffffffffffff, recovery window ffffffffffffffff, current time 1492198931015, last window id change time 1492198869957, window processing timeout millis 60000
Attachments
Issue Links
- links to