Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6618

FSNamesystem#delete drops the FSN lock between removing INodes from the tree and deleting them from the inode map

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.5.0
    • 2.5.0
    • None
    • None

    Description

      After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster.

      But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving addBlock() was accessing stale parent value. Although nulling out parent is done inside the FSNamesystem and FSDirectory write lock, there is no memory barrier because there is no "synchronized" block involved in the process.

      I suggest making parent volatile.

      Attachments

        1. HDFS-6618.AbstractList.patch
          4 kB
          Kihwal Lee
        2. HDFS-6618.inodeRemover.patch
          52 kB
          Kihwal Lee
        3. HDFS-6618.inodeRemover.v2.patch
          52 kB
          Kihwal Lee
        4. HDFS-6618.patch
          2 kB
          Kihwal Lee
        5. HDFS-6618.simpler.patch
          3 kB
          Kihwal Lee

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kihwal Kihwal Lee Assign to me
            kihwal Kihwal Lee
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment