Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27219

Change JONI encoding in RegexStringComparator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.5.0, 3.0.0-alpha-4, 2.4.14
    • Filters
    • None
    • Reviewed
    • Hide
      In RegexStringComparator an infinite loop can occur if an invalid UTF8 is encountered. We now use joni's NonStrictUTF8Encoding instead of UTF8Encoding to avoid the issue.
      Show
      In RegexStringComparator an infinite loop can occur if an invalid UTF8 is encountered. We now use joni's NonStrictUTF8Encoding instead of UTF8Encoding to avoid the issue.

    Description

      I change the engine of RegexStringComparator to JONI.
      After that I sent a regex filter request, the RegionServer's heap memory usage spiked and the RegionServer did not work due to GC.
       

      (RegionServer Heap Memory Usage)
       

      INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1435ms
      GC pool 'ParNew' had collection(s): count=1 time=1550ms
      INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1073ms
      GC pool 'ParNew' had collection(s): count=1 time=1534ms
      INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1456ms
      GC pool 'ParNew' had collection(s): count=1 time=1574ms
      INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1297ms
      GC pool 'ParNew' had collection(s): count=1 time=1415ms 

      (RegionServer Log)
       
      I checked the reason, it is said that when using UTF8Encoding, an infinite loop can occur if an invalid UTF8 is entered.
      For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
      (https://github.com/trinodb/trino/commit/ea66e8cb27b098a5cea184106fe245064351b567)

      After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in RegexStringComparator, it was confirmed that the heap memory usage spike was gone.

      Attachments

        1. rs-heap.png
          241 kB
          Minwoo Kang

        Issue Links

          Activity

            People

              minwoo.kang Minwoo Kang
              minwoo.kang Minwoo Kang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: