Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2984

S-live: Rate operation count for delete is worse than 0.20.204 by 28.8%

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.23.1
    • Fix Version/s: None
    • Component/s: benchmarks
    • Labels:
      None

      Description

      Rate operation count for delete is worse than 0.20.204.xx by 28.8%

      1. slive.tar.gz
        202 kB
        Ravi Prakash

        Issue Links

          Activity

          Hide
          Harsh J added a comment -

          What is "S-live" and what does it do?

          I believe the same was asked about another report that used a term "scan" - but am still in the dark about both.

          Show
          Harsh J added a comment - What is "S-live" and what does it do? I believe the same was asked about another report that used a term "scan" - but am still in the dark about both.
          Hide
          Todd Lipcon added a comment -

          You on trunk, tip of 23, or 23.1 rc? We just fixed a bug in HDFS that could be a culprit.

          Show
          Todd Lipcon added a comment - You on trunk, tip of 23, or 23.1 rc? We just fixed a bug in HDFS that could be a culprit.
          Hide
          Amol Kekre added a comment -

          .23.1

          Show
          Amol Kekre added a comment - .23.1
          Hide
          Konstantin Shvachko added a comment -

          Harsh> See HDFS-708 on Slive, and org.apache.hadoop.fs.slive.SliveTest.
          Todd> Which bug?

          Show
          Konstantin Shvachko added a comment - Harsh> See HDFS-708 on Slive, and org.apache.hadoop.fs.slive.SliveTest. Todd> Which bug?
          Hide
          Todd Lipcon added a comment -

          HADOOP-6502 should help IPC performance overall (though not specific to delete) and HDFS-2938 fixed a locking error in delete

          Show
          Todd Lipcon added a comment - HADOOP-6502 should help IPC performance overall (though not specific to delete) and HDFS-2938 fixed a locking error in delete
          Hide
          Amol Kekre added a comment -

          We would take a look at post new numbers

          Show
          Amol Kekre added a comment - We would take a look at post new numbers
          Hide
          Amol Kekre added a comment -

          Here are the new numbers: Looking good -
          Rate operations count for write: 6.25% worse.
          Rate operations count for read: 69.2% better.
          Rate operations count for rename: 59.5% better.
          Rate operations count for mkdir: 4.9% better.
          Rate operations count for list: 5.3% better.
          Rate operations count for delete: 28.8% worse.

          Show
          Amol Kekre added a comment - Here are the new numbers: Looking good - Rate operations count for write: 6.25% worse. Rate operations count for read: 69.2% better. Rate operations count for rename: 59.5% better. Rate operations count for mkdir: 4.9% better. Rate operations count for list: 5.3% better. Rate operations count for delete: 28.8% worse.
          Hide
          Todd Lipcon added a comment -

          Hi Amol. Can you include the git hash or svn branch/revision you're testing on when you report benchmark numbers? Also are you comparing to the 0.20.204 release or something off SVN?

          The 28.8% worse delete is interesting, still, though not terrifying.

          Show
          Todd Lipcon added a comment - Hi Amol. Can you include the git hash or svn branch/revision you're testing on when you report benchmark numbers? Also are you comparing to the 0.20.204 release or something off SVN? The 28.8% worse delete is interesting, still, though not terrifying.
          Hide
          Ravi Prakash added a comment -

          Ok! I've been slacking on this bug for way too long. But here are my experiments and the data.

          WHAT ARE THE FILES IN THIS TARBALL?
          ====================================
          patch is the diff of 2 minor optimizations I made in hadoop-23.

          I then ran Slive on clean HDFS installations for 0.23 and 0.204. These are the commands I ran. First create 200000 files (hopefully that's what it does... though its not important if it doesn't)
          bin/hadoop org.apache.hadoop.fs.slive.SliveTest -duration 50 -dirSize 1225 -files 200000 -maps 4 -readSize 104850,104850 -writeSize 104850,104850 -appendSize 104850,104850 -replication 1,1 -reduces 1 -blockSize 1024,1024 -mkdir 0,uniform -rename 0,uniform -append 0,uniform -delete 0,uniform -ls 0,uniform -read 0,uniform -create 100,uniform and then delete 50000 files (again, hopefully that's what it does)
          bin/hadoop org.apache.hadoop.fs.slive.SliveTest -duration 50 -dirSize 1225 -files 50000 -maps 4 -readSize 104850,104850 -writeSize 104850,104850 -appendSize 104850,104850 -replication 1,1 -reduces 1 -blockSize 1024,1024 -mkdir 0,uniform -rename 0,uniform -append 0,uniform -delete 100,uniform -ls 0,uniform -read 0,uniform -create 0,uniform
          I do this 3 times. Hence the 6 files
          <branch>.C200 <- create 200k files
          <branch>.C200D50 <- delete 50k files

          In the last run, I delete 500000 files, and use jvisualvm to create snapshots
          while I am profiling. The two snapshot*.npm files can be loaded into jvisualvm.

          OBSERVATIONS
          =============

          Create seems to be twice as fast in 0.23. So I'm not too worried about that.

          Delete on the other hand is a lot slower. I've tried optimizing, but I don't
          know if there's much else that can be done. A huge reason is probably this:
          http://blog.rapleaf.com/dev/2011/06/16/java-performance-synchronized-vs-lock/
          In 0.20 we were using the synchronized variable, which although is 2-7.5x
          faster (as reported in the blog), is unfair. In 0.23 we are using a fair
          ReentrantReadWriteLock. This is obviously going to be slower and since
          writeLock() is what's taking the most amount of time (ref the jvisualvm
          profile), I am led to believe that we must incur the performance hit in order
          to be fair.

          Comments are welcome. Please let me know your thoughts.

          @Todd: These are on the latest branch-23 74fd5cb929adc926a13eb062df7869894c0cc013

          Show
          Ravi Prakash added a comment - Ok! I've been slacking on this bug for way too long. But here are my experiments and the data. WHAT ARE THE FILES IN THIS TARBALL? ==================================== patch is the diff of 2 minor optimizations I made in hadoop-23. I then ran Slive on clean HDFS installations for 0.23 and 0.204. These are the commands I ran. First create 200000 files (hopefully that's what it does... though its not important if it doesn't) bin/hadoop org.apache.hadoop.fs.slive.SliveTest -duration 50 -dirSize 1225 -files 200000 -maps 4 -readSize 104850,104850 -writeSize 104850,104850 -appendSize 104850,104850 -replication 1,1 -reduces 1 -blockSize 1024,1024 -mkdir 0,uniform -rename 0,uniform -append 0,uniform -delete 0,uniform -ls 0,uniform -read 0,uniform -create 100,uniform and then delete 50000 files (again, hopefully that's what it does) bin/hadoop org.apache.hadoop.fs.slive.SliveTest -duration 50 -dirSize 1225 -files 50000 -maps 4 -readSize 104850,104850 -writeSize 104850,104850 -appendSize 104850,104850 -replication 1,1 -reduces 1 -blockSize 1024,1024 -mkdir 0,uniform -rename 0,uniform -append 0,uniform -delete 100,uniform -ls 0,uniform -read 0,uniform -create 0,uniform I do this 3 times. Hence the 6 files <branch>.C200 <- create 200k files <branch>.C200D50 <- delete 50k files In the last run, I delete 500000 files, and use jvisualvm to create snapshots while I am profiling. The two snapshot*.npm files can be loaded into jvisualvm. OBSERVATIONS ============= Create seems to be twice as fast in 0.23. So I'm not too worried about that. Delete on the other hand is a lot slower. I've tried optimizing, but I don't know if there's much else that can be done. A huge reason is probably this: http://blog.rapleaf.com/dev/2011/06/16/java-performance-synchronized-vs-lock/ In 0.20 we were using the synchronized variable, which although is 2-7.5x faster (as reported in the blog), is unfair. In 0.23 we are using a fair ReentrantReadWriteLock. This is obviously going to be slower and since writeLock() is what's taking the most amount of time (ref the jvisualvm profile), I am led to believe that we must incur the performance hit in order to be fair. Comments are welcome. Please let me know your thoughts. @Todd: These are on the latest branch-23 74fd5cb929adc926a13eb062df7869894c0cc013
          Hide
          Ravi Prakash added a comment -

          @Todd : And yes these are against branch-204. 43d9fd86c87514bd0e0e9ea19c84c2a0109bf77e

          Show
          Ravi Prakash added a comment - @Todd : And yes these are against branch-204. 43d9fd86c87514bd0e0e9ea19c84c2a0109bf77e
          Hide
          Ravi Prakash added a comment -

          Hmm... I ran

          bin/hadoop org.apache.hadoop.fs.slive.SliveTest -duration 9999999 -dirSize 100 -files 10000 -maps 4 -seed 1 -readSize 20480,20480 -writeSize 20480,20480 -appendSize 20480,20480 -replication 1,1 -blockSize 10240,10240 -delete 0,uniform -create 100,uniform -mkdir 0,uniform -rename 0,uniform -append 0,uniform -ls 0,uniform -read 0,uniform 

          and then

           bin/hadoop org.apache.hadoop.fs.slive.SliveTest -duration 9999999 -dirSize 100 -files 10000 -maps 4 -seed 1 -readSize 20480,20480 -writeSize 20480,20480 -appendSize 20480,20480 -replication 1,1 -blockSize 10240,10240 -delete 100,uniform -create 0,uniform -mkdir 0,uniform -rename 0,uniform -append 0,uniform -ls 0,uniform -read 0,uniform 

          0.23 seems to be much faster. For 0.20:
          Basic report for operation type DeleteOp
          -------------
          Measurement "failures" = 3212
          Measurement "milliseconds_taken" = 11211
          Measurement "op_count" = 4000
          Measurement "successes" = 788
          Rate for measurement "op_count" = 356.792 operations/sec
          Rate for measurement "successes" = 70.288 successes/sec
          -------------
          Basic report for operation type SliveMapper
          -------------
          Measurement "milliseconds_taken" = 16554
          Measurement "op_count" = 4000
          Rate for measurement "op_count" = 241.633 operations/sec

          whereas for 0.23:
          Basic report for operation type DeleteOp
          -------------
          Measurement "failures" = 3221
          Measurement "milliseconds_taken" = 6604
          Measurement "op_count" = 4000
          Measurement "successes" = 779
          Rate for measurement "op_count" = 605.694 operations/sec
          Rate for measurement "successes" = 117.959 successes/sec
          -------------
          Basic report for operation type SliveMapper
          -------------
          Measurement "milliseconds_taken" = 2456
          Measurement "op_count" = 4000
          Rate for measurement "op_count" = 1628.664 operations/sec

          The command I ran has allowed a much higher %ge of successful operations.

          Show
          Ravi Prakash added a comment - Hmm... I ran bin/hadoop org.apache.hadoop.fs.slive.SliveTest -duration 9999999 -dirSize 100 -files 10000 -maps 4 -seed 1 -readSize 20480,20480 -writeSize 20480,20480 -appendSize 20480,20480 -replication 1,1 -blockSize 10240,10240 -delete 0,uniform -create 100,uniform -mkdir 0,uniform -rename 0,uniform -append 0,uniform -ls 0,uniform -read 0,uniform and then bin/hadoop org.apache.hadoop.fs.slive.SliveTest -duration 9999999 -dirSize 100 -files 10000 -maps 4 -seed 1 -readSize 20480,20480 -writeSize 20480,20480 -appendSize 20480,20480 -replication 1,1 -blockSize 10240,10240 -delete 100,uniform -create 0,uniform -mkdir 0,uniform -rename 0,uniform -append 0,uniform -ls 0,uniform -read 0,uniform 0.23 seems to be much faster. For 0.20: Basic report for operation type DeleteOp ------------- Measurement "failures" = 3212 Measurement "milliseconds_taken" = 11211 Measurement "op_count" = 4000 Measurement "successes" = 788 Rate for measurement "op_count" = 356.792 operations/sec Rate for measurement "successes" = 70.288 successes/sec ------------- Basic report for operation type SliveMapper ------------- Measurement "milliseconds_taken" = 16554 Measurement "op_count" = 4000 Rate for measurement "op_count" = 241.633 operations/sec whereas for 0.23: Basic report for operation type DeleteOp ------------- Measurement "failures" = 3221 Measurement "milliseconds_taken" = 6604 Measurement "op_count" = 4000 Measurement "successes" = 779 Rate for measurement "op_count" = 605.694 operations/sec Rate for measurement "successes" = 117.959 successes/sec ------------- Basic report for operation type SliveMapper ------------- Measurement "milliseconds_taken" = 2456 Measurement "op_count" = 4000 Rate for measurement "op_count" = 1628.664 operations/sec The command I ran has allowed a much higher %ge of successful operations.

            People

            • Assignee:
              Ravi Prakash
              Reporter:
              Vinay Kumar Thota
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development