Hadoop Common
  1. Hadoop Common
  2. HADOOP-124

don't permit two datanodes to run from same dfs.data.dir

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.3.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      ~30 node cluster

      Description

      DFS files are still rotting.

      I suspect that there's a problem with block accounting/detecting identical hosts in the namenode. I have 30 physical nodes, with various numbers of local disks, meaning that my current 'bin/hadoop dfs -report" shows 80 nodes after a full restart. However, when I discovered the problem (which resulted in losing about 500gb worth of temporary data because of missing blocks in some of the larger chunks) -report showed 96 nodes. I suspect somehow there were extra datanodes running against the same paths, and that the namenode was counting those as replicated instances, which then showed up over-replicated, and one of them was told to delete its local block, leading to the block actually getting lost.

      I will debug it more the next time the situation arises. This is at least the 5th time I've had a large amount of file data "rot" in DFS since January.

      1. Hadoop-124-v3.patch
        90 kB
        Konstantin Shvachko
      2. Hadoop-124.patch
        81 kB
        Konstantin Shvachko
      3. DatanodeRegister.txt
        4 kB
        Konstantin Shvachko

        Issue Links

          Activity

            People

            • Assignee:
              Konstantin Shvachko
              Reporter:
              Bryan Pendleton
            • Votes:
              2 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development