Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I believe there is a race condition in org.apache.hadoop.chukwa.util.PidFile. The problem is that the creation and deletion of the file is not protected by any lock. Client A can delete the file just before Client B tries to acquire a lock. If at that moment Client C tries to create the file, it will succeed. Client B and Client C will both succeed in acquiring a lock because there are two different files (one is hidden because it was deleted after being opened). I have tested similar code on OS X and this is what happened.

        Activity

        Hide
        eyang Eric Yang added a comment - - edited

        PidFile class should be removed. Posix file lock interface only work inside the same process not across multiple instance of the programs. A old trick was to bind the locking to a port number as indicator if there is more than one instance of the program has been running. However, this approach may not be safe because third party could connect to the binding port to cause race condition as well. Hence, hadoop shell script is still the best solution:

        if pid file exists and program running
          warn the user, it's already running
          exit 1
        else
          start the program
          record pid
          sleep 1
        
        Show
        eyang Eric Yang added a comment - - edited PidFile class should be removed. Posix file lock interface only work inside the same process not across multiple instance of the programs. A old trick was to bind the locking to a port number as indicator if there is more than one instance of the program has been running. However, this approach may not be safe because third party could connect to the binding port to cause race condition as well. Hence, hadoop shell script is still the best solution: if pid file exists and program running warn the user, it's already running exit 1 else start the program record pid sleep 1
        Hide
        eyang Eric Yang added a comment -

        Removed the pid file locking java file, and use standard hadoop style shell script for pid file locking. Shell script style file locking is far from perfect, but it is more elegant than the misleading PidFile class.

        I also thought about modifying PidFile class to bind to a port but that strategy would yield similar result as shell script pid file test. Hence, I decided to go with the simple approach to avoid complexity.

        Show
        eyang Eric Yang added a comment - Removed the pid file locking java file, and use standard hadoop style shell script for pid file locking. Shell script style file locking is far from perfect, but it is more elegant than the misleading PidFile class. I also thought about modifying PidFile class to bind to a port but that strategy would yield similar result as shell script pid file test. Hence, I decided to go with the simple approach to avoid complexity.
        Hide
        eyang Eric Yang added a comment -

        I just committed this.

        Show
        eyang Eric Yang added a comment - I just committed this.

          People

          • Assignee:
            eyang Eric Yang
            Reporter:
            cbfiddle Alan Snyder
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development