HBase
  1. HBase
  2. HBASE-252

Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.

      Random read performance is important to hbase.

      Serial reads and writes perform about the same in TRUNK and 0.15.x.

      Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:

      0.15 branch

      [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
      07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
      07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
      07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
      ....
      07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
      07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
      07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
      07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
      07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
      07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
      ...
      07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
      07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
      07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
      07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
      07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
      07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
      ..
      07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
      07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
      07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
      07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
      07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
      07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
      ...
      07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
      07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
      07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
      

      TRUNK

      [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
      07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
      07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
      07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
      ....
      07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
      07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
      07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
      
      [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
      07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
      07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
      07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
      ...
      07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
      07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
      07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
      

      Above was done on a small hdfs cluster of 4 machines with each a dfs node.

        Activity

        Jim Kellerman made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Fix Version/s 0.16.0 [ 12312740 ]
        Project Hadoop Core [ 12310240 ] Hadoop HBase [ 12310753 ]
        Key HADOOP-2488 HBASE-252
        Assignee stack [ stack ]
        Raghu Angadi made changes -
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Raghu Angadi made changes -
        Assignee stack [ stack ]
        stack made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Fix Version/s 0.16.0 [ 12312740 ]
        Raghu Angadi made changes -
        Attachment HADOOP-2488.patch [ 12372241 ]
        stack made changes -
        Summary Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
        stack made changes -
        Field Original Value New Value
        Description Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.

        Random read performance is important to hbase.

        Serial reads and writes perform about the same in TRUNK and 0.15.x.

        Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:

        0.15 branch
        {code}
        [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
        07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
        07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ....
        07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
        07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
        07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ...
        07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
        07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
        07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ..
        07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
        07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
        07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ...
        07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms
        {code}

        TRUNK
        {code]
        [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
        07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
        07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ....
        07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms

        [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
        07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
        07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ...
        07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
        {code}

        Above was done on a small hdfs cluster of 4 machines with each a dfs node.
        Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.

        Random read performance is important to hbase.

        Serial reads and writes perform about the same in TRUNK and 0.15.x.

        Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:

        0.15 branch
        {code}
        [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
        07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
        07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ....
        07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
        07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
        07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ...
        07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
        07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
        07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ..
        07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
        07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
        07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ...
        07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms
        {code}

        TRUNK
        {code}
        [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
        07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
        07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ....
        07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms

        [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
        07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
        07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
        07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
        ...
        07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
        07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
        07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
        {code}

        Above was done on a small hdfs cluster of 4 machines with each a dfs node.
        stack created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development