Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-9428

Regex filters are at least an order of magnitude slower since 0.94.3

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0, 0.94.12, 0.96.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I found this issue after debugging a performance problem on an OpenTSDB cluster, it was basically unusable after an upgrade from 0.94.2 to 0.94.6. It was caused by HBASE-7279 (ping [~lhofhansl]).

      The easiest way to see it is to run a simple 1 client PE:

      $ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
      

      Then in the shell do a filter scan (flush the table first and make sure if fits in your blockcache if you want stable numbers).

      Pre HBASE-7279:

      hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872') )"}
      ROW                                                 COLUMN+CELL                                                                                                                                         
       0000055872                                         column=info:data, timestamp=1378248850191, value=(blanked)                                                                                                                                    
      1 row(s) in 1.2780 seconds
      

      Post HBASE-7279

      hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872') )"}
      ROW                                                 COLUMN+CELL                                                                                                                                         
       0000055872                                         column=info:data, timestamp=1378248850191, value=(blanked)                                                                                                                                      
      1 row(s) in 24.2940 seconds
      

      I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all slow like this.

      It seems that since that jira went in we do a lot more row matching, and running the regex gets super expensive.

        Attachments

        1. 9428-trunk.txt
          1 kB
          Lars Hofhansl
        2. 9428-0.94.txt
          1 kB
          Lars Hofhansl

          Activity

            People

            • Assignee:
              larsh Lars Hofhansl
              Reporter:
              jdcryans Jean-Daniel Cryans
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: