Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5399

Revisit SafeModeException and corresponding retry policies



    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • None
    • None
    • Reviewed


      Currently for NN SafeMode, we have the following corresponding retry policies:

      1. In non-HA setup, for certain API call ("create"), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled.
      2. In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291).

      There are several possible issues in the current implementation:

      1. The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode.
      2. Client may want to retry on other API calls in non-HA setup.
      3. We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry.


        1. HDFS-5399.003.patch
          14 kB
          Jing Zhao
        2. hdfs-5399.002.patch
          14 kB
          Todd Lipcon
        3. HDFS-5399.001.patch
          14 kB
          Jing Zhao
        4. HDFS-5399.000.patch
          11 kB
          Jing Zhao



            jingzhao Jing Zhao
            jingzhao Jing Zhao
            0 Vote for this issue
            10 Start watching this issue