Hadoop Common
  1. Hadoop Common
  2. HADOOP-4924

Race condition in re-init of TaskTracker

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.18.3, 0.19.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The taskReportServer is stopped in the TaskTracker.close() method in a thread. The race condition is:
      1) TaskTracker.close() is invoked - this starts a thread to stop the taskReportServer
      2) TaskTracker.initialize is invoked - this tries to create a new taskReportServer
      Assume that the thread started to stop the taskReportServer gets to start its work after (2) above. The thread will end up stopping the newly created taskReportServer.

      1. 4924.patch
        1 kB
        Devaraj Das

        Activity

        Hide
        Devaraj Das added a comment -

        I committed this to the 0.18 branch too.

        Show
        Devaraj Das added a comment - I committed this to the 0.18 branch too.
        Hide
        dhruba borthakur added a comment -

        Thanks Devaraj, for committing this into 0.19.

        Show
        dhruba borthakur added a comment - Thanks Devaraj, for committing this into 0.19.
        Hide
        Devaraj Das added a comment -

        Ok I committed this to 0.19 branch too..

        Show
        Devaraj Das added a comment - Ok I committed this to 0.19 branch too..
        Hide
        dhruba borthakur added a comment -

        Hi Devaraj, If this affects 0.19 as well, does it need to be commtted into that branch as well?

        Show
        dhruba borthakur added a comment - Hi Devaraj, If this affects 0.19 as well, does it need to be commtted into that branch as well?
        Hide
        Devaraj Das added a comment -

        Yes it does affect those versions as well.

        Show
        Devaraj Das added a comment - Yes it does affect those versions as well.
        Hide
        dhruba borthakur added a comment -

        It would be nice if somebody can comment if this affects 0.17 and 0.18 as well.

        Show
        dhruba borthakur added a comment - It would be nice if somebody can comment if this affects 0.17 and 0.18 as well.
        Hide
        Devaraj Das added a comment -

        All tests including test-patch passed on my machine. Committed this.

        Show
        Devaraj Das added a comment - All tests including test-patch passed on my machine. Committed this.
        Hide
        Arun C Murthy added a comment -

        +1

        Show
        Arun C Murthy added a comment - +1
        Hide
        Devaraj Das added a comment -

        In the attached patch, I removed the thread doing the taskReportServer.stop(). Instead the TaskTracker.close() stops the taskReportServer inline (this is anyway required for the case where the config for the TaskTracker specifies the RPC port as some non-zero number).

        Show
        Devaraj Das added a comment - In the attached patch, I removed the thread doing the taskReportServer.stop(). Instead the TaskTracker.close() stops the taskReportServer inline (this is anyway required for the case where the config for the TaskTracker specifies the RPC port as some non-zero number).

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Devaraj Das
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development