Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: None
    • Component/s: ha, namenode
    • Labels:
      None

      Description

      As described in this comment the performance of the HA branch for writes is significantly reduced compared to trunk. We need to dig a bit and optimize whatever it is that's hurting us in order to get back to the same performance numbers.

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          With the recent patches committed to the HA branch, performance is now comparable.

          Show
          Todd Lipcon added a comment - With the recent patches committed to the HA branch, performance is now comparable.
          Hide
          Todd Lipcon added a comment -

          Filed HDFS-3023 to optimize the size of the edit log entries for persistBlocks()

          Show
          Todd Lipcon added a comment - Filed HDFS-3023 to optimize the size of the edit log entries for persistBlocks()
          Hide
          Todd Lipcon added a comment -

          There are three possible components to the perf issue, I think:

          1) DN now sends RBW replicas to both NNs as soon as a block starts to be created. This adds 3 RPCs to each block creation (though they don't write to the edit logs)
          2) When blocks are allocated, we now log the full block list of that file. This creates a much bigger edit log, so of course takes more time.
          3) When HA is enabled, these new edit log entries are fsynced, which makes it even slower.

          I'm hoping to set up a cluster to test each of these in isolation by commenting out the related code from the HA branch and measuring a write benchmark. Once we identify which is the worst issue we can tackle it.

          Show
          Todd Lipcon added a comment - There are three possible components to the perf issue, I think: 1) DN now sends RBW replicas to both NNs as soon as a block starts to be created. This adds 3 RPCs to each block creation (though they don't write to the edit logs) 2) When blocks are allocated, we now log the full block list of that file. This creates a much bigger edit log, so of course takes more time. 3) When HA is enabled, these new edit log entries are fsynced, which makes it even slower. I'm hoping to set up a cluster to test each of these in isolation by commenting out the related code from the HA branch and measuring a write benchmark. Once we identify which is the worst issue we can tackle it.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development