Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-35

Files missing chunks can cause mapred runs to get stuck

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.1.0
    • 0.1.0
    • None
    • None
    • ~20 datanode DFS cluster

    Description

      I've now several times run into a problem where a large run gets stalled as a result of a missing data block. The latest was a stall in the Summer - ie, the data might've all been there, but it was impossible to proceed because the CRC file was missing a block. It would be nice to:

      1) Have a "health check" running on a map reduce. If any data isn't available, emmit periodic warnings, and maybe have a timeout for if the data never comes back. Such warnings should specify which file(s) are affected by the missing blocks.
      2) Have a utility, possible part of the existing dfs utility, which can check for dfs files with unlocatable blocks. Possibly, even show a 'health' of a file - ie, what percentage of its blocks are currently at the desired replication level. Currently, there's no way that I know of to find out if a file in DFS is going to be unreadable.

      Attachments

        1. dfsshell.health.patch.txt
          3 kB
          Bryan Pendleton

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bpendleton Bryan Pendleton
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: