Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7587

Edit log corruption can happen if append fails with a quota violation

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      We have seen a standby namenode crashing due to edit log corruption. It was complaining that OP_CLOSE cannot be applied because the file is not under-construction.

      When a client was trying to append to the file, the remaining space quota was very small. This caused a failure in prepareFileForWrite(), but after the inode was already converted for writing and a lease added. Since these were not undone when the quota violation was detected, the file was left in under-construction with an active lease without edit logging OP_ADD.

      A subsequent append() eventually caused a lease recovery after the soft limit period. This resulted in commitBlockSynchronization(), which closed the file with OP_CLOSE being logged. Since there was no corresponding OP_ADD, edit replaying could not apply this.

      Attachments

        1. HDFS-7587-branch-2.6.patch
          13 kB
          Ming Ma
        2. HDFS-7587.003.patch
          10 kB
          Jing Zhao
        3. HDFS-7587.002.patch
          9 kB
          Jing Zhao
        4. HDFS-7587.001.patch
          6 kB
          Jing Zhao
        5. HDFS-7587.patch
          5 kB
          Kihwal Lee

        Issue Links

          Activity

            People

              jingzhao Jing Zhao
              kihwal Kihwal Lee
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: