Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16774

Improve async delete replica on datanode

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      In our online cluster, a large number of ReplicaNotFoundExceptions occur when client reads the data.
      After tracing the root cause, it is caused by the asynchronous deletion of the replica operation has many stacked pending deletion caused ReplicationNotFoundException.
      Current the asynchronous delete of the replica operation process is as follows:
      1.remove the replica from the ReplicaMap
      2.delete the replica file on the disk [blocked in threadpool]
      3.notifying namenode through IBR [blocked in threadpool]

      In order to avoid similar problems as much as possible, consider optimizing the execution flow:
      The deleting replica from ReplicaMap, deleting replica from disk and notifying namenode through IBR are processed in the same asynchronous thread.

      Attachments

        Issue Links

          Activity

            People

              haiyang Hu Haiyang Hu
              haiyang Hu Haiyang Hu
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: