Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8707 Implement an async pure c++ HDFS client
  3. HDFS-11028

libhdfs++: FileSystem needs to be able to cancel pending connections

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • hdfs-client
    • None

    Description

      Cancel support is now reasonably robust except the case where a FileHandle operation ends up causing the RpcEngine to try to create a new RpcConnection. In HA configs it's common to have something like 10-20 failovers and a 20 second failover delay (no exponential backoff just yet). This means that all of the functions with synchronous interfaces can still block for many minutes after an operation has been canceled, and often the cause of this is something trivial like a bad config file.

      The current design makes this sort of thing tricky to do because the FileHandles need to be individually cancelable via CancelOperations, but they share the RpcEngine that does the async magic.

      Updated design:
      Original design would end up forcing lots of reconnects. Not a huge issue on an unauthenticated cluster but on a kerberized cluster this is a recipe for Kerberos thinking we're attempting a replay attack.

      User visible cancellation and internal resources cleanup are separable issues. The former can be implemented by atomically swapping the callback of the operation to be canceled with a no-op callback. The original callback is then posted to the IoService with an OperationCanceled status and the user is no longer blocked. For RPC cancels this is sufficient, it's not expensive to keep a request around a little bit longer and when it's eventually invoked or timed out it invokes the no-op callback and is ignored (other than a trace level log notification). Connect cancels push a flag down into the RPC engine to kill the connection and make sure it doesn't attempt to reconnect.

      Attachments

        1. HDFS-11028.HDFS-8707.000.patch
          147 kB
          James Clampffer
        2. HDFS-11028.HDFS-8707.001.patch
          21 kB
          James Clampffer
        3. HDFS-11028.HDFS-8707.002.patch
          34 kB
          James Clampffer
        4. HDFS-11028.HDFS-8707.003.patch
          45 kB
          James Clampffer
        5. HDFS-11028.HDFS-8707.004.patch
          47 kB
          James Clampffer

        Activity

          People

            James C James Clampffer
            James C James Clampffer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: