Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2149

Pure name-node benchmarks.



    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.16.0
    • 0.16.0
    • None
    • None


      Pure name-node benchmark.

      This patch starts a series of name-node benchmarks.
      The intention is to have a separate benchmark for every important name-node operation.
      The purpose of benchmarks is

      1. to measure the throughput for each name-node operation, and
      2. to evaluate changes in the name-node performance (gain or degradation) when optimization
        or new functionality patches are introduced.

      The benchmarks measure name-node throughput (ops per second) and the average execution time.
      The benchmark does not involve any other hadoop components except for the name-node.
      The name-node server is real, other components are simulated.
      There is no RPC overhead. Each operation is executed by calling directly the respective name-node method.
      The benchmark is multi-threaded, that is one can start multiple threads competing for the
      name-node resources by executing concurrently the same operation but with different data.
      See javadoc for more details.

      The patch contains implementation for two name-node operations: file creates and block reports.
      Implementation of other operations will follow.

      File creation benchmark.

      I've ran two series of the file create benchmarks on the name-node with different number of threads.
      The first series is run on the regular name-node performing an edits log transaction on every create.
      The transaction includes a synch to the disk.
      In the second series the name-node is modified so that the synchs are turned off.
      Each run of the benchmark performs the same number 10,000 of creates equally distributed between
      running threads. I used a 4 core 2.8Ghz machine.
      The following two tables summarized the results. Time is in milliseconds.

      threads time (msec)
      with synch
      with synch
      1 13074 764
      2 8883 1125
      4 7319 1366
      10 7094 1409
      20 6785 1473
      40 6776 1475
      100 6899 1449
      200 7131 1402
      400 7084 1411
      1000 7181 1392
      threads time (msec)
      no synch
      no synch
      1 4559 2193
      2 4979 2008
      4 5617 1780
      10 5679 1760
      20 5550 1801
      40 5804 1722
      100 5871 1703
      200 6037 1656
      400 5855 1707
      1000 6069 1647

      The results show:

      1. (Table 1) The new synchronization mechanism that batches synch calls from different threads works well.
        For one thread all synchs cause a real IO making it slow. The more threads is used the more synchs are
        batched resulting in better performance. The performance grows up to a certain point and then stabilizes
        at about 1450 ops/sec.
      2. (Table 2) Operations that do not require disk IOs are constrained by memory locks.
        Without synchs the one-threaded execution is the fastest, because there are no waits.
        More threads start to intervene with each other and have to wait.
        Again the performance stabilizes at about 1700 ops/sec, and does not degrade further.
      3. Our default 10 handlers per name-node is not the best choice neither for the io bound nor for the pure
        memory operations. We should increase the default to 20 handlers and on big classes 100 handlers
        or more can be used without loss of performance. In fact with more handlers more operations can be handled
        simultaneously, which prevents the name-node from dropping calls that are close to timeout.

      Block report benchmark.

      In this benchmarks each thread pretends it is a data-node and calls blockReport() with the same blocks.
      All blocks are real, that is they were previously allocated by the name-node and assigned to the data-nodes.
      Some reports can contain fake blocks, and some can have missing blocks.
      Each block report consists of 10,000 blocks. The total number of reports sent is 1000.
      The reports are equally divided between the data-nodes so that each of them sends equal number of reports.

      Here is the table with the results.

      data-nodes time (msec) ops/sec
      1 42234 24
      2 9412 106
      4 11465 87
      10 15632 64
      20 17623 57
      40 19563 51
      100 24315 41
      200 29789 34
      400 23636 42
      600 39682 26

      I did not have time to analyze this yet. So comments are welcome.


        1. NNThroughput.patch
          28 kB
          Konstantin Shvachko
        2. NNThroughput.patch
          29 kB
          Konstantin Shvachko

        Issue Links



              shv Konstantin Shvachko
              shv Konstantin Shvachko
              0 Vote for this issue
              1 Start watching this issue