Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1046

Datanode should periodically clean up /tmp from partially received (and not completed) block files

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.2, 0.12.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Cluster of 10 machines, running Hadoop 0.9.2 + Nutch

      Description

      Cluster is set up with tasktrackers running on the same machines as datanodes. Tasks create heavy load in terms of local CPU/RAM/diskIO. I noticed a lot of the following messages from the datanodes in such situations:

      2007-02-15 05:30:53,298 WARN dfs.DataNode - Failed to transfer blk_-4590782726923911824 to xxx.xxx.xxx/10.10.16.109:50010
      java.net.SocketException: Connection reset
      ....
      java.io.IOException: Block blk_71053993347675204 has already been started (though not completed), and thus cannot be created.

      My reading of the code in DataNode.DataXceiver.writeBlock() and FSDataset.writeToBlock() + FSDataset.java:459 suggests the following scenario: there is no cleanup of temporary files in /tmp that are used to store the incomplete blocks being transferred. If the datanode is CPU-starved and drops the connection while creating this temp file, the source datanode will attempt to transfer it again - but there is already a file under this name in /tmp, because when the connection was dropped the target datanode didn't bother to cleanup.

      I also see that this section is unchanged in trunk/.

      The solution to this would be to check the age of the physical file in the /tmp dir, in FSDataset.java:436 - if it's older than a few hours or so, we should delete it and proceed as if there were no ongoing create op for this block.

        Attachments

        1. fsdataset.patch
          1.0 kB
          Andrzej Bialecki

          Activity

            People

            • Assignee:
              ab Andrzej Bialecki
              Reporter:
              ab Andrzej Bialecki
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: