Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4688

DFSClient should not allow multiple concurrent creates for the same file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.3-alpha, 3.0.0-alpha1
    • None
    • None
    • None

    Description

      Credit to Harsh for tracing down most of this.

      If a DFSClient does create with overwrite multiple times on the same file, we can get into bad states. The exact failure mode depends on the state of the file, but at the least one DFSOutputStream will "win" over the others, leading to data loss in the sense that data written to the other DFSOutputStreams will be lost. While this is perhaps okay because of overwrite semantics, we've also seen other cases where the DFSClient loops indefinitely on close and blocks get marked as corrupt. This is not okay.

      One fix for this is adding some locking to DFSClient which prevents a user from opening multiple concurrent output streams to the same path.

      Attachments

        1. TestBadFileMaker.java
          1 kB
          Andrew Wang

        Issue Links

          Activity

            People

              andrew.wang Andrew Wang
              andrew.wang Andrew Wang
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: