Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8955

Support 'hedged' write in DFSClient

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.6.0
    • None
    • hdfs-client
    • None
    • Reviewed
    • Hide
      If a write from a block is slow, start up another parallel, 'hedged' write against a different set of replica. We need to get different set of replica/data pipeline from NN. We then take the result of which ever write returns first (the outstanding write is cancelled). This 'hedged' write feature will help rein in the outliers, the odd write that takes a long time because it hit a bad patch on the disc, etc.

      This feature is off by default. To enable this feature, set <code>dfs.client.hedged.write.threadpool.size</code> to a positive number. The threadpool size is how many threads to dedicate to the running of these 'hedged', concurrent writes in your client.

      Then set <code>dfs.client.hedged.write.threshold.millis</code> to the number of milliseconds to wait before starting up a 'hedged' write. For example, if you set this property to 10, then if a write has not returned within 10 milliseconds, we will start up a new read against a different block replica.

      This feature emits new metrics:

      + hedgedWriteOps
      + hedgeWriteOpsWin -- how many times the hedged write 'beat' the original write
      + hedgedWriteOpsInCurThread -- how many times we went to do a hedged write but we had to run it in the current thread because dfs.client.hedged.write.threadpool.size was at a maximum.
      Show
      If a write from a block is slow, start up another parallel, 'hedged' write against a different set of replica. We need to get different set of replica/data pipeline from NN. We then take the result of which ever write returns first (the outstanding write is cancelled). This 'hedged' write feature will help rein in the outliers, the odd write that takes a long time because it hit a bad patch on the disc, etc. This feature is off by default. To enable this feature, set <code>dfs.client.hedged.write.threadpool.size</code> to a positive number. The threadpool size is how many threads to dedicate to the running of these 'hedged', concurrent writes in your client. Then set <code>dfs.client.hedged.write.threshold.millis</code> to the number of milliseconds to wait before starting up a 'hedged' write. For example, if you set this property to 10, then if a write has not returned within 10 milliseconds, we will start up a new read against a different block replica. This feature emits new metrics: + hedgedWriteOps + hedgeWriteOpsWin -- how many times the hedged write 'beat' the original write + hedgedWriteOpsInCurThread -- how many times we went to do a hedged write but we had to run it in the current thread because dfs.client.hedged.write.threadpool.size was at a maximum.

    Description

      We do have hedged read which serves redundancy on read failures due to bad sector/patch in disk. We need to have similar feature for hdfs write. This feature may come with cost but its something to must have for use case which needs to guarantee write success regardless of degraded disk health. Defination of degraded disk is highly debatable but this is what I would define. "Degraded disk is the disk which fails to read and write intermittently"

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bijaya bijaya
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: