Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1158

HDFS-457 increases the chances of losing blocks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.21.0
    • None
    • datanode
    • None

    Description

      Whenever we restart a cluster, there's a chance of losing some blocks if more than three datanodes don't come up.
      HDFS-457 increases this chance by keeping the datanodes up even when

      1. /tmp disk goes read-only
      2. /disk0 that is used for storing PID goes read-only
        and probably more.

      In our environment, /tmp and /disk0 are from the same device.

      When trying to restart a datanode, it would fail with
      1)

      2010-05-15 05:45:45,575 WARN org.mortbay.log: tmpdir
      java.io.IOException: Read-only file system
              at java.io.UnixFileSystem.createFileExclusively(Native Method)
              at java.io.File.checkAndCreate(File.java:1704)
              at java.io.File.createTempFile(File.java:1792)
              at java.io.File.createTempFile(File.java:1828)
              at org.mortbay.jetty.webapp.WebAppContext.getTempDirectory(WebAppContext.java:745)
      

      or
      2)

      hadoop-daemon.sh: line 117: /disk/0/hadoop-datanode....com.out: Read-only file system
      hadoop-daemon.sh: line 118: /disk/0/hadoop-datanode.pid: Read-only file system
      

      I can recover the missing blocks but it takes some time.

      Also, we are losing track of block movements since log directory can also go to read-only but datanode would continue running.

      For 0.21 release, can we revert HDFS-457 or make it configurable?

      Attachments

        1. rev-HDFS-457.patch
          15 kB
          Konstantin Shvachko

        Issue Links

          Activity

            People

              Unassigned Unassigned
              knoguchi Koji Noguchi
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: