Nutch
  1. Nutch
  2. NUTCH-108

tasktracker crashs when reconnecting to a new jobtracker.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.8
    • Component/s: None
    • Labels:
      None

      Description

      051008 213532 Lost connection to JobTracker [/192.168.200.100:7020]. Retrying...
      051008 213537 Client connection to 192.168.200.100:7020: starting
      051008 213537 Client connection to 192.168.200.105:7030: closing
      051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
      051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
      051008 213537 Client connection to 192.168.200.102:7030: closing
      051008 213537 task_m_1iswra done; removing files.
      051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
      051008 213537 Client connection to 192.168.200.101:7030: closing
      Exception in thread "main" java.util.ConcurrentModificationException
      at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
      at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
      at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
      at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
      at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)

        Activity

        Sami Siren made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Sami Siren added a comment -

        closing issues for released versions

        Show
        Sami Siren added a comment - closing issues for released versions
        Doug Cutting made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Fix Version/s 0.8-dev [ 12310224 ]
        Hide
        Doug Cutting added a comment -

        I just committed this patch. Thanks, Paul!

        Show
        Doug Cutting added a comment - I just committed this patch. Thanks, Paul!
        Paul Baclace made changes -
        Field Original Value New Value
        Attachment TaskTracker.java.patch [ 12321585 ]
        Hide
        Paul Baclace added a comment -

        Here is a patch for reducing redundant, voluminous output while retrying to connect.

        Show
        Paul Baclace added a comment - Here is a patch for reducing redundant, voluminous output while retrying to connect.
        Hide
        Paul Baclace added a comment -

        I just had the opportunity to test this with 33 tasktrackers.

        One thing I noticed: TaskTracker.java should be patched to reduce the redundant, voluminous output (unnecessary stack trace every 5 sec.) from the retry loop.

        All of the tasktrackers are now able to successfully reconnect.

        Show
        Paul Baclace added a comment - I just had the opportunity to test this with 33 tasktrackers. One thing I noticed: TaskTracker.java should be patched to reduce the redundant, voluminous output (unnecessary stack trace every 5 sec.) from the retry loop. All of the tasktrackers are now able to successfully reconnect.
        Hide
        Doug Cutting added a comment -

        I think the patch is to replace the loop at the start of TaskTracker.close() with something like:

        while (tasks.size() != 0)

        { TaskInProgress tip = (TaskInProgress)tasks.first(); tip.jobHasFinished(); }

        I have not yet had time to test this.

        Show
        Doug Cutting added a comment - I think the patch is to replace the loop at the start of TaskTracker.close() with something like: while (tasks.size() != 0) { TaskInProgress tip = (TaskInProgress)tasks.first(); tip.jobHasFinished(); } I have not yet had time to test this.
        Hide
        Rod Taylor added a comment -

        I have seen this as well.

        When I took a look the JobTracker had knowledge of all of the events (via localhost:7845) but did not have any trackers connected to it. The trackers on all 5 machines had stopped running. After restarting the trackers the system continued from where it left off.

        Snipped from one tracker log. All tracker logs looked similar.

        051015 070222 task_m_abaf21 0.99999994% 30093 pages, 4546 errors, 14.9 pages/s, 1609 kb/s,
        051015 070222 Task task_m_abaf21 is done.
        051015 070222 Task task_m_abaf21 is done.
        051015 070222 Server connection on port 52226 from 192.168.100.14: exiting
        java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.emitHeartbeat(Unknown Source)
        at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
        Caused by: java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        ... 4 more
        051015 071940 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464]. Retrying...
        java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.emitHeartbeat(Unknown Source)
        at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
        Caused by: java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        ... 4 more
        <-- SNIP -->
        051015 081350 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464]. Retrying...
        java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.emitHeartbeat(Unknown Source)
        at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
        Caused by: java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        ... 4 more
        051015 081455 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464]. Retrying...
        051015 081510 task_m_2j2jh0 done; removing files.
        051015 081510 Server connection on port 41894 from 192.168.100.10: exiting
        051015 081510 Client connection to 192.168.100.10:61734: closing
        051015 081510 Client connection to 192.168.100.12:63227: closing
        051015 081510 Server connection on port 41894 from 192.168.100.12: exiting
        Exception in thread "main" java.util.ConcurrentModificationException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1031)
        at java.util.TreeMap$ValueIterator.next(TreeMap.java:1064)
        at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:130)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:281)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)

        Show
        Rod Taylor added a comment - I have seen this as well. When I took a look the JobTracker had knowledge of all of the events (via localhost:7845) but did not have any trackers connected to it. The trackers on all 5 machines had stopped running. After restarting the trackers the system continued from where it left off. Snipped from one tracker log. All tracker logs looked similar. 051015 070222 task_m_abaf21 0.99999994% 30093 pages, 4546 errors, 14.9 pages/s, 1609 kb/s, 051015 070222 Task task_m_abaf21 is done. 051015 070222 Task task_m_abaf21 is done. 051015 070222 Server connection on port 52226 from 192.168.100.14: exiting java.lang.reflect.UndeclaredThrowableException at $Proxy0.emitHeartbeat(Unknown Source) at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203) at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268) at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625) Caused by: java.io.IOException: timed out waiting for response at org.apache.nutch.ipc.Client.call(Client.java:296) at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127) ... 4 more 051015 071940 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464] . Retrying... java.lang.reflect.UndeclaredThrowableException at $Proxy0.emitHeartbeat(Unknown Source) at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203) at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268) at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625) Caused by: java.io.IOException: timed out waiting for response at org.apache.nutch.ipc.Client.call(Client.java:296) at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127) ... 4 more <-- SNIP --> 051015 081350 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464] . Retrying... java.lang.reflect.UndeclaredThrowableException at $Proxy0.emitHeartbeat(Unknown Source) at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203) at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268) at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625) Caused by: java.io.IOException: timed out waiting for response at org.apache.nutch.ipc.Client.call(Client.java:296) at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127) ... 4 more 051015 081455 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464] . Retrying... 051015 081510 task_m_2j2jh0 done; removing files. 051015 081510 Server connection on port 41894 from 192.168.100.10: exiting 051015 081510 Client connection to 192.168.100.10:61734: closing 051015 081510 Client connection to 192.168.100.12:63227: closing 051015 081510 Server connection on port 41894 from 192.168.100.12: exiting Exception in thread "main" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1031) at java.util.TreeMap$ValueIterator.next(TreeMap.java:1064) at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:130) at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:281) at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
        Stefan Groschupf created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Stefan Groschupf
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development