Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13031

To detect fsimage corruption on the spot

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • hdfs
    • None
    •  

    Description

      Since we fixed HDFS-9406, there are new cases reported from the field that similar fsimage corruption happens. We need good fsimage + editlogs to replay to reproduce the corruption. However, usually when the corruption is detected (at later NN restart), the good fsimage is already deleted.

      We need to have a way to detect fsimage corruption on the spot. Currently what I think we could do is:

      1. after SNN creates a new fsimage, it spawn a new modified NN process (NN with some new command line args) to just load the fsimage and do nothing else. 
      2. If the process failed, the currently running SNN will do either a) backup the fsimage + editlogs or b) no longer do checkpointing. And it need to somehow raise a flag to user that the fsimage is corrupt.

      In step 2, if we do a, we need to introduce new NN->JN API to backup editlogs; if we do b, it changes SNN's behavior, and kind of not compatible. 

      Attachments

        Issue Links

          Activity

            People

              adam.antal Adam Antal
              yzhangal Yongjun Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: