HBase
  1. HBase
  2. HBASE-6339

Bulkload call to RS should begin holding write lock only after the file has been transferred

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 0.90.0
    • Fix Version/s: None
    • Component/s: Client, regionserver
    • Labels:
      None

      Description

      I noticed that right now, under a bulkLoadHFiles call to an RS, we grab the HRegion write lock as soon as we determine that it is a multi-family bulk load we'll be attempting. The file copy from the caller's source FS is done after holding the lock.

      This doesn't seem right. For instance, we had a recent use-case where the bulk load running cluster is a separate HDFS instance/cluster than the one that runs HBase and the transfers between these FSes can get slower than an intra-cluster transfer. Hence I think we should begin to hold the write lock only after we've got a successful destinationFS copy of the requested file, and thereby allow more write throughput to pass.

      Does this sound reasonable to do?

        Activity

        Harsh J created issue -
        Harsh J made changes -
        Field Original Value New Value
        Description I noticed that right now, under a bulkLoadHFiles call to an RS, we grab the write lock as soon as we determine that it is a multi-family bulk load we'll be attempting. The file copy from the caller's source FS is done after holding the lock.

        This doesn't seem right. For instance, we had a recent use-case where the bulk load running cluster is a separate HDFS instance/cluster than the one that runs HBase and the transfers between these FSes can get slower than an intra-cluster transfer. Hence I think we should begin to hold the write lock only after we've got a successful destinationFS copy of the requested file, and thereby allow more write throughput to pass.

        Does this sound reasonable to do?
        I noticed that right now, under a bulkLoadHFiles call to an RS, we grab the HRegion write lock as soon as we determine that it is a multi-family bulk load we'll be attempting. The file copy from the caller's source FS is done after holding the lock.

        This doesn't seem right. For instance, we had a recent use-case where the bulk load running cluster is a separate HDFS instance/cluster than the one that runs HBase and the transfers between these FSes can get slower than an intra-cluster transfer. Hence I think we should begin to hold the write lock only after we've got a successful destinationFS copy of the requested file, and thereby allow more write throughput to pass.

        Does this sound reasonable to do?
        Hide
        Harsh J added a comment -

        I noticed this shouldn't be done, otherwise, given the current logic, a split may occur during the bulk load file pull after having been verified. It is fine as-is for the moment.

        Show
        Harsh J added a comment - I noticed this shouldn't be done, otherwise, given the current logic, a split may occur during the bulk load file pull after having been verified. It is fine as-is for the moment.
        Harsh J made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Invalid [ 6 ]
        Hide
        Ted Yu added a comment -

        In many production systems, region splitting is effectively disabled.
        Looks like there is some in gem in your initial proposal if we can reliably detect that there is no region splitting.

        Show
        Ted Yu added a comment - In many production systems, region splitting is effectively disabled. Looks like there is some in gem in your initial proposal if we can reliably detect that there is no region splitting.
        Hide
        Harsh J added a comment -

        Thanks for the comments Ted.

        Region splitting being disabled isn't a simple toggle value, so its kinda tricky to determine if it is indeed disabled. Besides that, there's still a chance of a manual split operation.

        Granted we can dupe the checks, once before the file pull (lock before this but then release), and once again right after (lock here and return only at end, as normal), I think that adds unnecessary complications. For the moment, if Ops had HBASE-6350, I think it should be satisfactory enough. It isn't often that I notice separated FS clusters loading between them.

        Thoughts? Is it worth the extra check and complexity addition?

        Show
        Harsh J added a comment - Thanks for the comments Ted. Region splitting being disabled isn't a simple toggle value, so its kinda tricky to determine if it is indeed disabled. Besides that, there's still a chance of a manual split operation. Granted we can dupe the checks, once before the file pull (lock before this but then release), and once again right after (lock here and return only at end, as normal), I think that adds unnecessary complications. For the moment, if Ops had HBASE-6350 , I think it should be satisfactory enough. It isn't often that I notice separated FS clusters loading between them. Thoughts? Is it worth the extra check and complexity addition?
        Hide
        Ted Yu added a comment -

        @Harsh:
        Your argument makes sense.

        Show
        Ted Yu added a comment - @Harsh: Your argument makes sense.

          People

          • Assignee:
            Harsh J
            Reporter:
            Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development