Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27219

Change JONI encoding in RegexStringComparator

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.5.0, 3.0.0-alpha-4, 2.4.14
    • Filters
    • None
    • Reviewed
    • Hide
      In RegexStringComparator an infinite loop can occur if an invalid UTF8 is encountered. We now use joni's NonStrictUTF8Encoding instead of UTF8Encoding to avoid the issue.
      Show
      In RegexStringComparator an infinite loop can occur if an invalid UTF8 is encountered. We now use joni's NonStrictUTF8Encoding instead of UTF8Encoding to avoid the issue.

    Description

      I change the engine of RegexStringComparator to JONI.
      After that I sent a regex filter request, the RegionServer's heap memory usage spiked and the RegionServer did not work due to GC.
       

      (RegionServer Heap Memory Usage)
       

      INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1435ms
      GC pool 'ParNew' had collection(s): count=1 time=1550ms
      INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1073ms
      GC pool 'ParNew' had collection(s): count=1 time=1534ms
      INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1456ms
      GC pool 'ParNew' had collection(s): count=1 time=1574ms
      INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1297ms
      GC pool 'ParNew' had collection(s): count=1 time=1415ms 

      (RegionServer Log)
       
      I checked the reason, it is said that when using UTF8Encoding, an infinite loop can occur if an invalid UTF8 is entered.
      For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
      (https://github.com/trinodb/trino/commit/ea66e8cb27b098a5cea184106fe245064351b567)

      After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in RegexStringComparator, it was confirmed that the heap memory usage spike was gone.

      Attachments

        1. rs-heap.png
          241 kB
          Minwoo Kang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            minwoo.kang Minwoo Kang
            minwoo.kang Minwoo Kang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment