Harmony
  1. Harmony
  2. HARMONY-1682

Jitrino.OPT performs incorrect GC enumeration in nested loop with array accesses

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: DRLVM
    • Labels:
      None
    • Environment:
      Linux/IA32
    • Estimated Complexity:
      Moderate

      Description

      the problem in brief:
      (see more details in the attached test)
      for (int i = 0; i < N; i++) {
      something = A[i].Something();
      for (int j = 0; j < A[i].getArray().length; j++) {
      // perform some GC intensive action here
      if (A[i].getArray()[j].getName().equals("xxx"))

      { // use A[i] here break; }

      }
      }

      "memopt" optimization pass eliminates the innermost memory read accesses to A[i] preserving a temporary reference to the object in the innermost loop. When a GC intensive action is performed, the array (A) is moved to another location, but the temporary reference is not updated with GC (which leads to crash). So, I suspect a problem in GC enumeration or, maybe, some other aspects of JIT<->GC interface.

      some facts:
      On Jitrino.JET the test passes.
      On Jitrino.OPT the test leads to crash:
      $ $JAVA -Xem:opt ClTest
      SIGSEGV in VM code.
      Stack trace:
      1: ?? (??:-1)
      <end of stack trace>
      Segmentation fault
      If "memopt" is turned OFF in opt.emconf, the test passes on Jitrino.OPT (but memopt performs correct transformations, which is visible from log files)
      If a larger Java heap is specified (which postpones GC actions), the test takes longer to run, but crashes, anyway:
      $ $JAVA -Xms1024m -Xmx1024m -Xem:opt ClTest
      1_20
      2_20
      3_20
      4_20
      5_20
      6_20
      7_20
      8_20
      9_20
      SIGSEGV in VM code.
      Stack trace:
      1: ?? (??:-1)
      <end of stack trace>
      Segmentation fault

      The test is reduced from the kernel test ClassLoaderTest which fails with exactly the same symptoms.

      1. ClTest.java
        2 kB
        Egor Pasko
      2. interior_assert.diff
        0.9 kB
        Ivan Volosyuk
      3. jit_gc.diff
        8 kB
        Mikhail Fursov
      4. retrieve_root_set.diff
        0.5 kB
        Ivan Volosyuk

        Activity

        Hide
        Egor Pasko added a comment -

        the test to reproduce
        crashes on the latest DRLVM with -Xem:opt

        Show
        Egor Pasko added a comment - the test to reproduce crashes on the latest DRLVM with -Xem:opt
        Hide
        Egor Pasko added a comment -

        one more fact: if method "test" is compiled by JET, while others are compiled by OPT, the test passes

        Show
        Egor Pasko added a comment - one more fact: if method "test" is compiled by JET, while others are compiled by OPT, the test passes
        Hide
        Ivan Volosyuk added a comment -

        I have recently found problem with enumeration of interior pointers produced by jitrino.opt. It looked for me like miscommunication between GC and jitrino. Offset reported for interior pointer was a huge number. Interior pointers on enumeratuin apears quite rare. A few for a number of GCs. This can be a cause of the problem.

        Show
        Ivan Volosyuk added a comment - I have recently found problem with enumeration of interior pointers produced by jitrino.opt. It looked for me like miscommunication between GC and jitrino. Offset reported for interior pointer was a huge number. Interior pointers on enumeratuin apears quite rare. A few for a number of GCs. This can be a cause of the problem.
        Hide
        Egor Pasko added a comment -

        Ivan, do you have a small 100% reproducer for the interior pointer problem? would be great

        Show
        Egor Pasko added a comment - Ivan, do you have a small 100% reproducer for the interior pointer problem? would be great
        Hide
        Mikhail Fursov added a comment -

        Ivan, I checked the test you sent
        The test fails and GC runs only once before the failure.
        I moved all methods except the problem one to Jitrino.JET and Jitrino.OPT reports only 2 items: one object and one [mptr, object] pair with a small offset.

        According to the IR I have this is OK.
        The GC enumeration point is:
        I36: (AD:t54(EAX):cls:ClTest$Package[]) =CALL t53(-390:m:ClTest$MyObject.getPackages):int32 (AU:t43(EAX):cls:ClTest$MyObject)

        I can help you to add more logging into Jitrino.OPT GCMap algorithms. Can you imagine the logging that will help?

        Show
        Mikhail Fursov added a comment - Ivan, I checked the test you sent The test fails and GC runs only once before the failure. I moved all methods except the problem one to Jitrino.JET and Jitrino.OPT reports only 2 items: one object and one [mptr, object] pair with a small offset. According to the IR I have this is OK. The GC enumeration point is: I36: (AD:t54(EAX):cls:ClTest$Package[]) =CALL t53(-390:m:ClTest$MyObject.getPackages):int32 (AU:t43(EAX):cls:ClTest$MyObject) I can help you to add more logging into Jitrino.OPT GCMap algorithms. Can you imagine the logging that will help?
        Hide
        Ivan Volosyuk added a comment -

        Here is patch which detects wrong interior pointers reported.
        It looks like we have serious miscommunication between GC and JIT.
        Assertion fails on the attached test.

        Show
        Ivan Volosyuk added a comment - Here is patch which detects wrong interior pointers reported. It looks like we have serious miscommunication between GC and JIT. Assertion fails on the attached test.
        Hide
        Mikhail Fursov added a comment -

        I'm agree that we have serious miscommunication between GC and JIT.
        JIT does not expect that GC moves objects during the enumeration.

        Ivan, does GCv4.1move objects during the enumeration process?

        Show
        Mikhail Fursov added a comment - I'm agree that we have serious miscommunication between GC and JIT. JIT does not expect that GC moves objects during the enumeration. Ivan, does GCv4.1move objects during the enumeration process?
        Hide
        weldon washburn added a comment -

        What are the results when this test is tried with GCV4.0?

        Also, it is conceivable that allowing a GC to move object during enumeration could allow for more concurrency. If indeed objects are being moved during enumeration, it would be good to provide a cost/benefit analysis. Adding concurrency to the JIT/GC enumeration interface introduces a bunch of new synchronization issues.

        Show
        weldon washburn added a comment - What are the results when this test is tried with GCV4.0? Also, it is conceivable that allowing a GC to move object during enumeration could allow for more concurrency. If indeed objects are being moved during enumeration, it would be good to provide a cost/benefit analysis. Adding concurrency to the JIT/GC enumeration interface introduces a bunch of new synchronization issues.
        Hide
        Mikhail Fursov added a comment -

        It's not a problem (neither performance nor implementation) to cache all enumeration results in JIT before reporting. I can prototype it tomorrow.

        But I do not understand the following n this scheme: if JIT has multiple mptrs for the same base to report, will this base relocated when mptr is reported? In this case when the next mptr for the same base is reported we have an error again.

        Show
        Mikhail Fursov added a comment - It's not a problem (neither performance nor implementation) to cache all enumeration results in JIT before reporting. I can prototype it tomorrow. But I do not understand the following n this scheme: if JIT has multiple mptrs for the same base to report, will this base relocated when mptr is reported? In this case when the next mptr for the same base is reported we have an error again.
        Hide
        weldon washburn added a comment -

        Mikhail said:
        > If JIT has multiple mptrs for the same base to report, will this base relocated when mptr is reported?

        The short answer is, I think it best that the GC does not move anything during enumeration. The JIT should plan on this for the foreseeable future.

        Maybe much later on we will need move objects and update stack/register locations while enumeration unwind is under way. This might be needed for multicore hardware. And an excellent approach would be analyze relevant workloads on real multicore HW using existing (boring) enumeration protocol. Once hot spots are identified, we can discuss changes to the existing JIT/GC interface. But for now, let's learn to crawl before learning to walk.

        Show
        weldon washburn added a comment - Mikhail said: > If JIT has multiple mptrs for the same base to report, will this base relocated when mptr is reported? The short answer is, I think it best that the GC does not move anything during enumeration. The JIT should plan on this for the foreseeable future. Maybe much later on we will need move objects and update stack/register locations while enumeration unwind is under way. This might be needed for multicore hardware. And an excellent approach would be analyze relevant workloads on real multicore HW using existing (boring) enumeration protocol. Once hot spots are identified, we can discuss changes to the existing JIT/GC interface. But for now, let's learn to crawl before learning to walk.
        Hide
        Geir Magnusson Jr added a comment -

        I assume we should wait for a fix before applying anything?

        Show
        Geir Magnusson Jr added a comment - I assume we should wait for a fix before applying anything?
        Hide
        weldon washburn added a comment -

        Geir wrote
        > I assume we should wait for a fix before applying anything?

        Actually, its even more basic than waiting for a fix. We first need a common understanding of what is and is not allowed in the JIT/GC interface. Once the JIT/GC interface semantics are agreed on, the next thing to determine is which (if any) subsystem actually has a bug. If there is a bug, open a bug report. And finally, if there is a bug report, wait for a bug fix.

        Show
        weldon washburn added a comment - Geir wrote > I assume we should wait for a fix before applying anything? Actually, its even more basic than waiting for a fix. We first need a common understanding of what is and is not allowed in the JIT/GC interface. Once the JIT/GC interface semantics are agreed on, the next thing to determine is which (if any) subsystem actually has a bug. If there is a bug, open a bug report. And finally, if there is a bug report, wait for a bug fix.
        Hide
        Mikhail Fursov added a comment -

        I'm agree that GC should not move objects during the enumeration.
        The example: if JIT has 2 references to the same object JIT will report both of them and expect GC updates both references. But if the object is moved after it's reported the first time, the second report to the old memory location will fail.

        Show
        Mikhail Fursov added a comment - I'm agree that GC should not move objects during the enumeration. The example: if JIT has 2 references to the same object JIT will report both of them and expect GC updates both references. But if the object is moved after it's reported the first time, the second report to the old memory location will fail.
        Hide
        Ivan Volosyuk added a comment -

        Quick fix:
        To retrieve all root set before modifications to heap use: -Dgc.remember_root_set=true
        To do this the by default. Use patch: retrieve_root_set.diff

        Show
        Ivan Volosyuk added a comment - Quick fix: To retrieve all root set before modifications to heap use: -Dgc.remember_root_set=true To do this the by default. Use patch: retrieve_root_set.diff
        Hide
        Mikhail Fursov added a comment -

        The fix for JIT enumeration.
        With this fix JIT precaches all offsets for managed pointers in a method before reporting.
        The initial test passes with this fix.

        Show
        Mikhail Fursov added a comment - The fix for JIT enumeration. With this fix JIT precaches all offsets for managed pointers in a method before reporting. The initial test passes with this fix.
        Hide
        Egor Pasko added a comment -

        test passes for me if jit_gc.diff (from Mikhail) is applied (Linux/ia32 gcc 3.3.3), need to fix line endings in the patch to apply on Linux
        BTW, I like the patch and feel it is a good fix for the bug. No performance degradation is expected.

        Show
        Egor Pasko added a comment - test passes for me if jit_gc.diff (from Mikhail) is applied (Linux/ia32 gcc 3.3.3), need to fix line endings in the patch to apply on Linux BTW, I like the patch and feel it is a good fix for the bug. No performance degradation is expected.
        Hide
        Ivan Volosyuk added a comment -

        Note for commiters. The only patch to be applied is: jit_gc.diff

        Show
        Ivan Volosyuk added a comment - Note for commiters. The only patch to be applied is: jit_gc.diff
        Hide
        Geir Magnusson Jr added a comment -

        Sorry weldon, I didn't notice that it was assigned to you when I assigned to me... was just in "get patches in" mode...

        Ivan : thanks for being so clear on what patch was relevant - it really helps

        Show
        Geir Magnusson Jr added a comment - Sorry weldon, I didn't notice that it was assigned to you when I assigned to me... was just in "get patches in" mode... Ivan : thanks for being so clear on what patch was relevant - it really helps
        Hide
        Geir Magnusson Jr added a comment -

        r463954

        Ubuntu 6 - smoke, c-unit, ~kernel

        Show
        Geir Magnusson Jr added a comment - r463954 Ubuntu 6 - smoke, c-unit, ~kernel

          People

          • Assignee:
            Geir Magnusson Jr
            Reporter:
            Egor Pasko
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development