Lucene - Core
  1. Lucene - Core
  2. LUCENE-3867

RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml

      A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ...

      While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is:

      	/**
      	 * Computes the approximate size of a String object. Note that if this object
      	 * is also referenced by another object, you should add
      	 * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
      	 * method.
      	 */
      	public static int sizeOf(String str) {
      		return 2 * str.length() + 6 // chars + additional safeness for arrays alignment
      				+ 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers
      				+ RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array
      				+ RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object
      	}
      

      If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]).

      1. LUCENE-3867-compressedOops.patch
        4 kB
        Uwe Schindler
      2. LUCENE-3867.patch
        6 kB
        Shai Erera
      3. LUCENE-3867.patch
        7 kB
        Uwe Schindler
      4. LUCENE-3867.patch
        7 kB
        Uwe Schindler
      5. LUCENE-3867.patch
        10 kB
        Shai Erera
      6. LUCENE-3867.patch
        10 kB
        Shai Erera
      7. LUCENE-3867.patch
        13 kB
        Uwe Schindler
      8. LUCENE-3867.patch
        17 kB
        Uwe Schindler
      9. LUCENE-3867.patch
        25 kB
        Uwe Schindler
      10. LUCENE-3867.patch
        27 kB
        Uwe Schindler
      11. LUCENE-3867.patch
        32 kB
        Uwe Schindler
      12. LUCENE-3867.patch
        32 kB
        Uwe Schindler
      13. LUCENE-3867.patch
        35 kB
        Uwe Schindler
      14. LUCENE-3867.patch
        35 kB
        Uwe Schindler
      15. LUCENE-3867.patch
        49 kB
        Dawid Weiss
      16. LUCENE-3867.patch
        49 kB
        Uwe Schindler
      17. LUCENE-3867.patch
        49 kB
        Uwe Schindler
      18. LUCENE-3867.patch
        49 kB
        Uwe Schindler
      19. LUCENE-3867-3.x.patch
        53 kB
        Uwe Schindler
      20. LUCENE-3867.patch
        314 kB
        Dawid Weiss
      21. LUCENE-3867.patch
        39 kB
        Dawid Weiss

        Activity

        Hide
        Dawid Weiss added a comment -

        One can provide exact object allocation size (including alignments) by running with an agent (acquired from Instrumentation). This is shown here, for example:

        http://www.javaspecialists.eu/archive/Issue142.html

        I don't think it makes sense to be "perfect" here because there is a tradeoff between being accurate and being fast. One thing to possibly improve would be to handle reference size (4 vs. 8 bytes; in particular with compact references while running under 64 bit jvms).

        Show
        Dawid Weiss added a comment - One can provide exact object allocation size (including alignments) by running with an agent (acquired from Instrumentation). This is shown here, for example: http://www.javaspecialists.eu/archive/Issue142.html I don't think it makes sense to be "perfect" here because there is a tradeoff between being accurate and being fast. One thing to possibly improve would be to handle reference size (4 vs. 8 bytes; in particular with compact references while running under 64 bit jvms).
        Hide
        Dawid Weiss added a comment -

        Oh, one thing that I had in the back of my mind was to run a side-by-side comparison of Lucene's memory estimator and "exact" memory occupation via agent and see what the real difference is (on various vms and with compact vs. non-compact refs).

        This would be a 2 hour effort I guess, fun, but I don't have the time for it.

        Show
        Dawid Weiss added a comment - Oh, one thing that I had in the back of my mind was to run a side-by-side comparison of Lucene's memory estimator and "exact" memory occupation via agent and see what the real difference is (on various vms and with compact vs. non-compact refs). This would be a 2 hour effort I guess, fun, but I don't have the time for it.
        Hide
        Uwe Schindler added a comment -

        I was talking with Shai already about the OBJECT_REF size of 8, in RamUsageEstimator it is:

        public final static int NUM_BYTES_OBJECT_REF = Constants.JRE_IS_64BIT ? 8 : 4;
        

        ...which does not take the CompressedOops into account. Can we detect those oops, so we can change the above ternary to return 4 on newer JVMs with compressed oops enabled?

        Show
        Uwe Schindler added a comment - I was talking with Shai already about the OBJECT_REF size of 8, in RamUsageEstimator it is: public final static int NUM_BYTES_OBJECT_REF = Constants.JRE_IS_64BIT ? 8 : 4; ...which does not take the CompressedOops into account. Can we detect those oops, so we can change the above ternary to return 4 on newer JVMs with compressed oops enabled?
        Hide
        Dawid Weiss added a comment -

        If you're running with an agent then it will tell you many bytes a reference is, so this would fix the issue. I don't think you can test this from within Java VM itself, but this is an interesting question. What you could do is spawn a child VM process with identical arguments (and an agent) and check it there, but this is quite awful...

        I'll ask on hotspot mailing list, maybe they know how to do this.

        Show
        Dawid Weiss added a comment - If you're running with an agent then it will tell you many bytes a reference is, so this would fix the issue. I don't think you can test this from within Java VM itself, but this is an interesting question. What you could do is spawn a child VM process with identical arguments (and an agent) and check it there, but this is quite awful... I'll ask on hotspot mailing list, maybe they know how to do this.
        Hide
        Shai Erera added a comment -

        I don't think it makes sense to be "perfect" here because there is a tradeoff between being accurate and being fast.

        I agree. We should be fast, and "as accurate as we can get while preserving speed".

        I will fix the constant's value as it's wrong. The helper methods are just that - helper. Someone can use other techniques to compute the size of objects.

        Will post a patch shortly.

        Show
        Shai Erera added a comment - I don't think it makes sense to be "perfect" here because there is a tradeoff between being accurate and being fast. I agree. We should be fast, and "as accurate as we can get while preserving speed". I will fix the constant's value as it's wrong. The helper methods are just that - helper. Someone can use other techniques to compute the size of objects. Will post a patch shortly.
        Hide
        Michael McCandless added a comment -

        Nice catch on the overcounting of array's RAM usage!

        And +1 for additional sizeOf(...) methods.

        Show
        Michael McCandless added a comment - Nice catch on the overcounting of array's RAM usage! And +1 for additional sizeOf(...) methods.
        Hide
        Uwe Schindler added a comment -

        Hi Mike,

        Dawid and I were already contacting Hotspot list. There is an easy way to get the compressedOoooooops setting from inside the JVM using MXBeans from the ManagementFactory. I think we will provide a patch later! I think by that we could also optimize the check for 64 bit, because that one should also be reported by the MXBean without looking into strange sysprops (see the TODO in the code for JRE_IS_64BIT).

        Uwe

        Show
        Uwe Schindler added a comment - Hi Mike, Dawid and I were already contacting Hotspot list. There is an easy way to get the compressedOoooooops setting from inside the JVM using MXBeans from the ManagementFactory. I think we will provide a patch later! I think by that we could also optimize the check for 64 bit, because that one should also be reported by the MXBean without looking into strange sysprops (see the TODO in the code for JRE_IS_64BIT). Uwe
        Hide
        Dawid Weiss added a comment -

        Sysprops should be a fallback though because (to be verified) they're supported by other vendors whereas the mx bean may not be.

        It needs to be verified by running under j9, jrockit, etc.

        Show
        Dawid Weiss added a comment - Sysprops should be a fallback though because (to be verified) they're supported by other vendors whereas the mx bean may not be. It needs to be verified by running under j9, jrockit, etc.
        Hide
        Michael McCandless added a comment -

        Consulting MXBean sounds great?

        Sysprops should be a fallback though

        +1

        Show
        Michael McCandless added a comment - Consulting MXBean sounds great? Sysprops should be a fallback though +1
        Hide
        Uwe Schindler added a comment -

        Here the patch for detecting compressesOops in Sun JVMs. For other JVMs it will simply use false, so the object refs will be guessed to have 64 bits, which is fine as upper memory limit.

        The code does only use public Java APIs and falls back if anything fails to false.

        Show
        Uwe Schindler added a comment - Here the patch for detecting compressesOops in Sun JVMs. For other JVMs it will simply use false, so the object refs will be guessed to have 64 bits, which is fine as upper memory limit. The code does only use public Java APIs and falls back if anything fails to false.
        Hide
        Shai Erera added a comment -

        Patch adds RUE.sizeOf(String) and various sizeOf(arr[]) methods. Also fixes the ARRAY_HEADER.

        Uwe, I merged with your patch, with one difference – the System.out prints in the test are printed only if VERBOSE.

        Show
        Shai Erera added a comment - Patch adds RUE.sizeOf(String) and various sizeOf(arr[]) methods. Also fixes the ARRAY_HEADER. Uwe, I merged with your patch, with one difference – the System.out prints in the test are printed only if VERBOSE.
        Hide
        Uwe Schindler added a comment -

        Shai: Thanks! I am in a train at the moment, so internet is slow/not working. I will later find out what MXBeans we can use to detect 64bit without looking at strange sysprops (which may have been modified by user code, so not really secure to use...).

        I left the non-verbose printlns in it, so people reviewing the patch can quickly see by running that test what happens on their JVM. It would be interesting to see what your jRockit does...

        Show
        Uwe Schindler added a comment - Shai: Thanks! I am in a train at the moment, so internet is slow/not working. I will later find out what MXBeans we can use to detect 64bit without looking at strange sysprops (which may have been modified by user code, so not really secure to use...). I left the non-verbose printlns in it, so people reviewing the patch can quickly see by running that test what happens on their JVM. It would be interesting to see what your jRockit does...
        Hide
        Shai Erera added a comment -

        I tried IBM and Oracle 1.6 JVMs, and both printed the same:

            [junit] ------------- Standard Output ---------------
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: This JVM uses CompressedOops: false
            [junit] ------------- ---------------- ---------------
        

        So no CompressedOops for me .

        I will later find out what MXBeans we can use to detect 64bit without looking at strange sysprops

        Ok. If you'll make it, we can add these changes to that patch, otherwise we can also do them in a separate issue.

        Show
        Shai Erera added a comment - I tried IBM and Oracle 1.6 JVMs, and both printed the same: [junit] ------------- Standard Output --------------- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: false [junit] ------------- ---------------- --------------- So no CompressedOops for me . I will later find out what MXBeans we can use to detect 64bit without looking at strange sysprops Ok. If you'll make it, we can add these changes to that patch, otherwise we can also do them in a separate issue.
        Hide
        Uwe Schindler added a comment - - edited

        Hm, for me (1.6.0_31, 7u3) it prints true. What JVMs are you using and what settings?

        Show
        Uwe Schindler added a comment - - edited Hm, for me (1.6.0_31, 7u3) it prints true. What JVMs are you using and what settings?
        Hide
        Uwe Schindler added a comment -

        Here my results:

        *****************************************************
        JAVA_HOME = C:\Program Files\Java\jdk1.7.0_03
        java version "1.7.0_03"
        Java(TM) SE Runtime Environment (build 1.7.0_03-b05)
        Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)
        *****************************************************
        
        C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam*
        [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,561 sec
        [junit]
        [junit] ------------- Standard Output ---------------
        [junit] NOTE: This JVM is 64bit: true
        [junit] NOTE: This JVM uses CompressedOops: true
        [junit] ------------- ---------------- ---------------
        
        C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops
        [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,5 sec
        [junit]
        [junit] ------------- Standard Output ---------------
        [junit] NOTE: This JVM is 64bit: true
        [junit] NOTE: This JVM uses CompressedOops: false
        [junit] ------------- ---------------- ---------------
        
        *****************************************************
        JAVA_HOME = C:\Program Files\Java\jdk1.6.0_31
        java version "1.6.0_31"
        Java(TM) SE Runtime Environment (build 1.6.0_31-b05)
        Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
        *****************************************************
        
        C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam*
        [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,453 sec
        [junit]
        [junit] ------------- Standard Output ---------------
        [junit] NOTE: This JVM is 64bit: true
        [junit] NOTE: This JVM uses CompressedOops: true
        [junit] ------------- ---------------- ---------------
        
        C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops
        [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,421 sec
        [junit]
        [junit] ------------- Standard Output ---------------
        [junit] NOTE: This JVM is 64bit: true
        [junit] NOTE: This JVM uses CompressedOops: false
        [junit] ------------- ---------------- ---------------
        
        C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:+UseCompressedOops
        [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,422 sec
        [junit]
        [junit] ------------- Standard Output ---------------
        [junit] NOTE: This JVM is 64bit: true
        [junit] NOTE: This JVM uses CompressedOops: true
        [junit] ------------- ---------------- ---------------
        
        Show
        Uwe Schindler added a comment - Here my results: ***************************************************** JAVA_HOME = C:\Program Files\Java\jdk1.7.0_03 java version "1.7.0_03" Java(TM) SE Runtime Environment (build 1.7.0_03-b05) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode) ***************************************************** C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,561 sec [junit] [junit] ------------- Standard Output --------------- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: true [junit] ------------- ---------------- --------------- C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,5 sec [junit] [junit] ------------- Standard Output --------------- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: false [junit] ------------- ---------------- --------------- ***************************************************** JAVA_HOME = C:\Program Files\Java\jdk1.6.0_31 java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) ***************************************************** C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,453 sec [junit] [junit] ------------- Standard Output --------------- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: true [junit] ------------- ---------------- --------------- C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:-UseCompressedOops [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,421 sec [junit] [junit] ------------- Standard Output --------------- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: false [junit] ------------- ---------------- --------------- C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\core>ant test -Dtestcase=TestRam* -Dargs=-XX:+UseCompressedOops [junit] Testsuite: org.apache.lucene.util.TestRamUsageEstimator [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0,422 sec [junit] [junit] ------------- Standard Output --------------- [junit] NOTE: This JVM is 64bit: true [junit] NOTE: This JVM uses CompressedOops: true [junit] ------------- ---------------- ---------------
        Hide
        Shai Erera added a comment -

        Oracle:

        java version "1.6.0_21"
        Java(TM) SE Runtime Environment (build 1.6.0_21-b07)
        Java HotSpot(TM) 64-Bit Server VM (build 17.0-b17, mixed mode)
        

        IBM:

        java version "1.6.0"
        Java(TM) SE Runtime Environment (build pwa6460sr9fp3-20111122_05(SR9 FP3))
        IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Windows 7 amd64-64 jvmwa6460sr9-20111111_94827 (JIT enabled, AOT enabled)
        J9VM - 20111111_094827
        JIT  - r9_20101028_17488ifx45
        GC   - 20101027_AA)
        JCL  - 20110727_07
        
        Show
        Shai Erera added a comment - Oracle: java version "1.6.0_21" Java(TM) SE Runtime Environment (build 1.6.0_21-b07) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b17, mixed mode) IBM: java version "1.6.0" Java(TM) SE Runtime Environment (build pwa6460sr9fp3-20111122_05(SR9 FP3)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Windows 7 amd64-64 jvmwa6460sr9-20111111_94827 (JIT enabled, AOT enabled) J9VM - 20111111_094827 JIT - r9_20101028_17488ifx45 GC - 20101027_AA) JCL - 20110727_07
        Hide
        Shai Erera added a comment -

        I ran "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and with the Oracle JVM I get "Compressed Oops: true" but with IBM JVM I still get 'false'.

        Show
        Shai Erera added a comment - I ran "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and with the Oracle JVM I get "Compressed Oops: true" but with IBM JVM I still get 'false'.
        Hide
        Uwe Schindler added a comment -

        OK, that is expected. 1.6.0_21 does not enable compressedOops by default, so false is correct. If you manually enable, it gets true.

        jRockit is jRockit and not Sun/Oracle, so the result is somehow expected. It seems to nor have that MXBrean. But the code does not produce strange exceptions, so at least in the Sun VM we can detect compressed Oops and guess the reference size better. 8 is still not bad as it gives an upper limit.

        Show
        Uwe Schindler added a comment - OK, that is expected. 1.6.0_21 does not enable compressedOops by default, so false is correct. If you manually enable, it gets true. jRockit is jRockit and not Sun/Oracle, so the result is somehow expected. It seems to nor have that MXBrean. But the code does not produce strange exceptions, so at least in the Sun VM we can detect compressed Oops and guess the reference size better. 8 is still not bad as it gives an upper limit.
        Hide
        Uwe Schindler added a comment -

        By the way, here is the code from the hotspot mailing list member (my code is based on it), it also shows the outputs for different JVMs:

        https://gist.github.com/1333043

        (I just removed the com.sun.* imports and replaced by reflection)

        Show
        Uwe Schindler added a comment - By the way, here is the code from the hotspot mailing list member (my code is based on it), it also shows the outputs for different JVMs: https://gist.github.com/1333043 (I just removed the com.sun.* imports and replaced by reflection)
        Hide
        Shai Erera added a comment -

        8 is still not bad as it gives an upper limit.

        I agree. Better to over-estimate here, than under-estimate.

        Would appreciate if someone can take a look at the sizeOf() impls before I commit.

        Show
        Shai Erera added a comment - 8 is still not bad as it gives an upper limit. I agree. Better to over-estimate here, than under-estimate. Would appreciate if someone can take a look at the sizeOf() impls before I commit.
        Hide
        Uwe Schindler added a comment - - edited

        On Hotspot Mailing list some people also seem to have an idea about jRockit and IBM J9:

        From: Krystal Mok
        Sent: Wednesday, March 14, 2012 3:46 PM
        To: Uwe Schindler
        Cc: Dawid Weiss; hotspot compiler
        Subject: Re: How to detect if the VM is running with compact refs from within the VM (no agent)?

        Hi,

        Just in case you'd care, the same MXBean could be used to detect compressed references on JRockit, too. It's probably available starting from JRockit R28.

        Instead of "UseCompressedOops", use "CompressedRefs" as the VM option name on JRockit.

        Don't know how to extract this information for J9 without another whole bunch of hackeries...well, you could try this, on a "best-effort" basis for platform detection:
        IBM J9's VM version string contains the compressed reference information. Example:

        $ export JAVA_OPTS='-Xcompressedrefs'
        $ groovysh
        Groovy Shell (1.7.7, JVM: 1.7.0)
        Type 'help' or '\h' for help.
        ----------------------------------------------------------------------------------------------------------------------------
        groovy:000> System.getProperty 'java.vm.info'
        ===> JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT enabled, AOT enabled)
        J9VM - R26_Java726_GA_20110810_1208_B88592
        JIT - r11_20110810_20466
        GC - R26_Java726_GA_20110810_1208_B88592_CMPRSS
        J9CL - 20110810_88604
        groovy:000> quit

        So grepping for "Compressed References" in the "java.vm.info" system property gives you the clue.

        • Kris
        Show
        Uwe Schindler added a comment - - edited On Hotspot Mailing list some people also seem to have an idea about jRockit and IBM J9: From: Krystal Mok Sent: Wednesday, March 14, 2012 3:46 PM To: Uwe Schindler Cc: Dawid Weiss; hotspot compiler Subject: Re: How to detect if the VM is running with compact refs from within the VM (no agent)? Hi, Just in case you'd care, the same MXBean could be used to detect compressed references on JRockit, too. It's probably available starting from JRockit R28. Instead of "UseCompressedOops", use "CompressedRefs" as the VM option name on JRockit. Don't know how to extract this information for J9 without another whole bunch of hackeries...well, you could try this, on a "best-effort" basis for platform detection: IBM J9's VM version string contains the compressed reference information. Example: $ export JAVA_OPTS='-Xcompressedrefs' $ groovysh Groovy Shell (1.7.7, JVM: 1.7.0) Type 'help' or '\h' for help. ---------------------------------------------------------------------------------------------------------------------------- groovy:000> System.getProperty 'java.vm.info' ===> JRE 1.7.0 Linux amd64-64 Compressed References 20110810_88604 (JIT enabled, AOT enabled) J9VM - R26_Java726_GA_20110810_1208_B88592 JIT - r11_20110810_20466 GC - R26_Java726_GA_20110810_1208_B88592_CMPRSS J9CL - 20110810_88604 groovy:000> quit So grepping for "Compressed References" in the "java.vm.info" system property gives you the clue. Kris
        Hide
        Michael McCandless added a comment -

        Patch looks good!

        Maybe just explain in sizeOf(String) javadoc that this method assumes the String is "standalone" (ie, does not reference a larger char[] than itself)?

        Because... if you call String.substring, the returned string references a slice the char[] of the original one... and so technically the RAM it's tying up could be (much) larger than expected. (At least, this used to be the case... not sure if it's changed...).

        Show
        Michael McCandless added a comment - Patch looks good! Maybe just explain in sizeOf(String) javadoc that this method assumes the String is "standalone" (ie, does not reference a larger char[] than itself)? Because... if you call String.substring, the returned string references a slice the char[] of the original one... and so technically the RAM it's tying up could be (much) larger than expected. (At least, this used to be the case... not sure if it's changed...).
        Hide
        Shai Erera added a comment -

        Good point. I clarified the jdocs with this:

          /**
           * Returns the approximate size of a String object. This computation relies on
           * {@link String#length()} to compute the number of bytes held by the char[].
           * However, if the String object passed to this method is the result of e.g.
           * {@link String#substring}, the computation may be entirely inaccurate
           * (depending on the difference between length() and the actual char[]
           * length).
           */
        

        If there are no objections, I'd like to commit this.

        Show
        Shai Erera added a comment - Good point. I clarified the jdocs with this: /** * Returns the approximate size of a String object. This computation relies on * {@link String #length()} to compute the number of bytes held by the char []. * However, if the String object passed to this method is the result of e.g. * {@link String #substring}, the computation may be entirely inaccurate * (depending on the difference between length() and the actual char [] * length). */ If there are no objections, I'd like to commit this.
        Hide
        Dawid Weiss added a comment -

        I would opt for sizeOf to return the actual size of the object, including underlying string buffers... We can take into account interning buffers but other than that I wouldn't skew the result because it can be misleading.

        Show
        Dawid Weiss added a comment - I would opt for sizeOf to return the actual size of the object, including underlying string buffers... We can take into account interning buffers but other than that I wouldn't skew the result because it can be misleading.
        Hide
        Dawid Weiss added a comment -

        I don't like this special handling of Strings, to be honest. Why do we need/do it?

        Show
        Dawid Weiss added a comment - I don't like this special handling of Strings, to be honest. Why do we need/do it?
        Hide
        Shai Erera added a comment -

        I don't like this special handling of Strings, to be honest. Why do we need/do it?

        Because I wrote it, and it seemed useful to me, so why not? We know how Strings look like, at least in their worse case. If there will be a better implementation, we can fix it in RUE, rather than having many impls try to do it on their own?

        Show
        Shai Erera added a comment - I don't like this special handling of Strings, to be honest. Why do we need/do it? Because I wrote it, and it seemed useful to me, so why not? We know how Strings look like, at least in their worse case. If there will be a better implementation, we can fix it in RUE, rather than having many impls try to do it on their own?
        Hide
        Michael McCandless added a comment -

        I don't like this special handling of Strings, to be honest.

        I'm confused: what special handling of Strings are we talking about...?

        You mean that sizeOf(String) doesn't return the correct answer if the string came from a previous .substring (.split too) call...?

        If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]?

        Show
        Michael McCandless added a comment - I don't like this special handling of Strings, to be honest. I'm confused: what special handling of Strings are we talking about...? You mean that sizeOf(String) doesn't return the correct answer if the string came from a previous .substring (.split too) call...? If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]?
        Hide
        Dawid Weiss added a comment -
        +  /** Returns the size in bytes of the String[] object. */
        +  public static int sizeOf(String[] arr) {
        +    int size = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * arr.length);
        +    for (String s : arr) {
        +      size += sizeOf(s);
        +    }
        +    return size;
        +  }
        +
        +  /** Returns the approximate size of a String object. */
        +  public static int sizeOf(String str) {
        +    // String's char[] size
        +    int arraySize = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_CHAR * str.length());
        +
        +    // String's row object size    
        +    int objectSize = alignObjectSize(NUM_BYTES_OBJECT_REF /* array reference */
        +        + 3 * NUM_BYTES_INT /* String holds 3 integers */
        +        + NUM_BYTES_OBJECT_HEADER /* String object header */);
        +    
        +    return objectSize + arraySize;
        +  }
        

        What I mean is that without looking at the code I would expect sizeOf(String[] N) to return the actual memory taken by an array of strings. If they point to a single char[], this should simple count the object overhead, not count every character N times as it would do now. This isn't sizeOf(), this is sum(string lengths * 2) + epsilon to me.

        I'd keep RamUsageEstimator exactly what the name says – an estimation of the actual memory taken by a given object. A string can point to a char[] and if so this should be traversed as an object and counted once.

        Show
        Dawid Weiss added a comment - + /** Returns the size in bytes of the String [] object. */ + public static int sizeOf( String [] arr) { + int size = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * arr.length); + for ( String s : arr) { + size += sizeOf(s); + } + return size; + } + + /** Returns the approximate size of a String object. */ + public static int sizeOf( String str) { + // String 's char [] size + int arraySize = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_CHAR * str.length()); + + // String 's row object size + int objectSize = alignObjectSize(NUM_BYTES_OBJECT_REF /* array reference */ + + 3 * NUM_BYTES_INT /* String holds 3 integers */ + + NUM_BYTES_OBJECT_HEADER /* String object header */); + + return objectSize + arraySize; + } What I mean is that without looking at the code I would expect sizeOf(String[] N) to return the actual memory taken by an array of strings. If they point to a single char[], this should simple count the object overhead, not count every character N times as it would do now. This isn't sizeOf(), this is sum(string lengths * 2) + epsilon to me. I'd keep RamUsageEstimator exactly what the name says – an estimation of the actual memory taken by a given object. A string can point to a char[] and if so this should be traversed as an object and counted once.
        Hide
        Dawid Weiss added a comment -

        If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]?

        Same as with other objects – traverse its fields and count them (once, building an identity set for all objects reachable from the root)?

        Show
        Dawid Weiss added a comment - If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]? Same as with other objects – traverse its fields and count them (once, building an identity set for all objects reachable from the root)?
        Hide
        Shai Erera added a comment -

        What I mean is that without looking at the code I would expect sizeOf(String[] N) to return the actual memory taken by an array of strings.

        So you mean you'd want sizeOf(String[]) be just that?

        return alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * arr.length);
        

        I don't mind. I just thought that since we know how to compute sizeOf(String), we can use that. It's an extreme case, I think, that someone will want to compute the size of String[] which share same char[] instance ... but I don't mind if it bothers you that much, to simplify it and document that it computes the raw size of the String[].

        But I don't think that we should change sizeOf(String) to not count the char[] size. It's part of the object, and really it's String, not like we're trying to compute the size of a general object.

        Same as with other objects – traverse its fields and count them

        RUE already has .estimateRamUsage(Object) which does that through reflection. I think that sizeOf(String) can remain fast as it is now, with the comment that it my over-estimate if the String is actually a sub-string of one original larger string. In the worse case, we'll just be over-estimating.

        Show
        Shai Erera added a comment - What I mean is that without looking at the code I would expect sizeOf(String[] N) to return the actual memory taken by an array of strings. So you mean you'd want sizeOf(String[]) be just that? return alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * arr.length); I don't mind. I just thought that since we know how to compute sizeOf(String), we can use that. It's an extreme case, I think, that someone will want to compute the size of String[] which share same char[] instance ... but I don't mind if it bothers you that much, to simplify it and document that it computes the raw size of the String[]. But I don't think that we should change sizeOf(String) to not count the char[] size. It's part of the object, and really it's String, not like we're trying to compute the size of a general object. Same as with other objects – traverse its fields and count them RUE already has .estimateRamUsage(Object) which does that through reflection. I think that sizeOf(String) can remain fast as it is now, with the comment that it my over-estimate if the String is actually a sub-string of one original larger string. In the worse case, we'll just be over-estimating.
        Hide
        Uwe Schindler added a comment -

        Hi Shai,

        can ypou try this patch with J9 or maybe JRockit (Robert)? If yozu use one of those JVMs you may have to explicitely enable compressed Oops/refs!

        Show
        Uwe Schindler added a comment - Hi Shai, can ypou try this patch with J9 or maybe JRockit (Robert)? If yozu use one of those JVMs you may have to explicitely enable compressed Oops/refs!
        Hide
        Dawid Weiss added a comment -

        RUE already has .estimateRamUsage(Object) which does that through reflection. I think that sizeOf(String) can remain fast as it is now, with the comment that it my over-estimate if the String is actually a sub-string of one original larger string. In the worse case, we'll just be over-estimating.

        Yeah, that's exactly what I didn't like. All the primitive/ primitive array methods are fine, but why make things inconsistent with sizeOf(String)? I'd rather have the reflection-based method estimate the size of a String/String[]. Like we mentioned it's always a matter of speed/accuracy but here I'd opt for accuracy because the output can be off by a lot if you make substrings along the way (not to mention it assumes details about String internal implementation which may or may not be true, depending on the vendor).

        Do you have a need for this method, Shai? If you don't then why not wait (with this part) until such a need arises?

        Show
        Dawid Weiss added a comment - RUE already has .estimateRamUsage(Object) which does that through reflection. I think that sizeOf(String) can remain fast as it is now, with the comment that it my over-estimate if the String is actually a sub-string of one original larger string. In the worse case, we'll just be over-estimating. Yeah, that's exactly what I didn't like. All the primitive/ primitive array methods are fine, but why make things inconsistent with sizeOf(String)? I'd rather have the reflection-based method estimate the size of a String/String[]. Like we mentioned it's always a matter of speed/accuracy but here I'd opt for accuracy because the output can be off by a lot if you make substrings along the way (not to mention it assumes details about String internal implementation which may or may not be true, depending on the vendor). Do you have a need for this method, Shai? If you don't then why not wait (with this part) until such a need arises?
        Hide
        Shai Erera added a comment -

        Do you have a need for this method, Shai?

        I actually started this issue because of this method . I wrote the method for my own code, then spotted the bug in the ARRAY_HEADER, and on the go thought that it will be good if RUE would offer it for me / other people can benefit from it. Because from my experience, after I put code in Lucene, very smart people improve and optimize it, and I benefit from it in new releases.

        So while I could keep sizeOf(String) in my own code, I know that Uwe/Robert/Mike/You will make it more efficient when Java 7/8/9 will be out, while I'll totally forget about it ! .

        Show
        Shai Erera added a comment - Do you have a need for this method, Shai? I actually started this issue because of this method . I wrote the method for my own code, then spotted the bug in the ARRAY_HEADER, and on the go thought that it will be good if RUE would offer it for me / other people can benefit from it. Because from my experience, after I put code in Lucene, very smart people improve and optimize it, and I benefit from it in new releases. So while I could keep sizeOf(String) in my own code, I know that Uwe/Robert/Mike/You will make it more efficient when Java 7/8/9 will be out, while I'll totally forget about it ! .
        Hide
        Dawid Weiss added a comment -

        Yeah... well... I'm flattered I'm still -1 for adding this particular method because I don't like being surprised at how a method works and this is surprising behavior to me, especially in this class (even if it's documented in the javadoc, but who reads it anyway, right?).

        If others don't share my opinion then can we at least rename this method to sizeOfBlah(..) where Blah is something that would indicate it's not actually taking into account char buffer sharing or sub-slicing (suggestions for Blah welcome)?

        Show
        Dawid Weiss added a comment - Yeah... well... I'm flattered I'm still -1 for adding this particular method because I don't like being surprised at how a method works and this is surprising behavior to me, especially in this class (even if it's documented in the javadoc, but who reads it anyway, right?). If others don't share my opinion then can we at least rename this method to sizeOfBlah(..) where Blah is something that would indicate it's not actually taking into account char buffer sharing or sub-slicing (suggestions for Blah welcome)?
        Hide
        Mark Miller added a comment -

        estimateSizeOf(..)
        guessSizeOf(..)
        wildGuessSizeOf(..)
        incorrectSizeOf(..)
        sizeOfWeiss(..)
        weissSize(..)
        sizeOfButWithoutTakingIntoAccountCharBufferSharingOrSubSlicingSeeJavaDoc(..)

        Show
        Mark Miller added a comment - estimateSizeOf(..) guessSizeOf(..) wildGuessSizeOf(..) incorrectSizeOf(..) sizeOfWeiss(..) weissSize(..) sizeOfButWithoutTakingIntoAccountCharBufferSharingOrSubSlicingSeeJavaDoc(..)
        Hide
        Michael McCandless added a comment -

        If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]?

        Same as with other objects – traverse its fields and count them (once, building an identity set for all objects reachable from the root)?

        Aha, cool! I hadn't realized RUE can crawl into the private char[] inside string and count up the RAM usage correctly. That's nice.

        Maybe lowerBoundSizeOf(...)?

        Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) and somewhere document that you should do new RUE().size(String/String[]) instead...? Hmm or maybe we do add the methods, but implement them under-the-hood w/ that?

        Show
        Michael McCandless added a comment - If so, how can we actually fix that? Is there some way to ask a string for the true length of its char[]? Same as with other objects – traverse its fields and count them (once, building an identity set for all objects reachable from the root)? Aha, cool! I hadn't realized RUE can crawl into the private char[] inside string and count up the RAM usage correctly. That's nice. Maybe lowerBoundSizeOf(...)? Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) and somewhere document that you should do new RUE().size(String/String[]) instead...? Hmm or maybe we do add the methods, but implement them under-the-hood w/ that?
        Hide
        Dawid Weiss added a comment -

        sizeOfWeiss(..)

        We're talking some serious dimensions here, beware of buffer overflows!

        Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) and somewhere document that you should do new RUE().size(String/String[]) instead..

        This is something I would go for – it's consistent with what I would consider this class's logic. I would even change it to sizeOf(Object) – this would be a static shortcut to just measure an object's size, no strings attached?

        Kabutz's code also distinguishes interned strings/ cached boxed integers and enums. This could be a switch much like it is now with interned Strings. Then this would really be either an upper (why lower, Mike?) bound or something that would try to be close to the exact memory consumption.

        A fun way to determine if we're right would be to run a benchmark with -Xmx20mb and test how close we can get to the main memory pool's maximum value before OOM is thrown.

        Show
        Dawid Weiss added a comment - sizeOfWeiss(..) We're talking some serious dimensions here, beware of buffer overflows! Or maybe we don't add the new string methods (sizeOf(String), sizeOf(String[])) and somewhere document that you should do new RUE().size(String/String[]) instead.. This is something I would go for – it's consistent with what I would consider this class's logic. I would even change it to sizeOf(Object) – this would be a static shortcut to just measure an object's size, no strings attached? Kabutz's code also distinguishes interned strings/ cached boxed integers and enums. This could be a switch much like it is now with interned Strings. Then this would really be either an upper (why lower, Mike?) bound or something that would try to be close to the exact memory consumption. A fun way to determine if we're right would be to run a benchmark with -Xmx20mb and test how close we can get to the main memory pool's maximum value before OOM is thrown.
        Hide
        Michael McCandless added a comment -

        (why lower, Mike?)

        Oh I just meant the sizeOf(String) impl in the current patch is a lower bound (since it "guesses" the private char[] length by calling String.length(), which is a lower bound on the actual char[] length).

        Show
        Michael McCandless added a comment - (why lower, Mike?) Oh I just meant the sizeOf(String) impl in the current patch is a lower bound (since it "guesses" the private char[] length by calling String.length(), which is a lower bound on the actual char[] length).
        Hide
        Dawid Weiss added a comment -

        John Rose just replied to my question – there are fields in Unsafe that allow array scaling (1.7). Check these out:

                ARRAY_BOOLEAN_INDEX_SCALE = theUnsafe.arrayIndexScale([Z);
                ARRAY_BYTE_INDEX_SCALE = theUnsafe.arrayIndexScale([B);
                ARRAY_SHORT_INDEX_SCALE = theUnsafe.arrayIndexScale([S);
                ARRAY_CHAR_INDEX_SCALE = theUnsafe.arrayIndexScale([C);
                ARRAY_INT_INDEX_SCALE = theUnsafe.arrayIndexScale([I);
                ARRAY_LONG_INDEX_SCALE = theUnsafe.arrayIndexScale([J);
                ARRAY_FLOAT_INDEX_SCALE = theUnsafe.arrayIndexScale([F);
                ARRAY_DOUBLE_INDEX_SCALE = theUnsafe.arrayIndexScale([D);
                ARRAY_OBJECT_INDEX_SCALE = theUnsafe.arrayIndexScale([Ljava/lang/Object;);
                ADDRESS_SIZE = theUnsafe.addressSize();
        

        So... there is a (theoretical?) possibility that, say, byte[] is machine word-aligned I bet any RAM estimator written so far will be screwed if this happens

        Show
        Dawid Weiss added a comment - John Rose just replied to my question – there are fields in Unsafe that allow array scaling (1.7). Check these out: ARRAY_BOOLEAN_INDEX_SCALE = theUnsafe.arrayIndexScale([Z); ARRAY_BYTE_INDEX_SCALE = theUnsafe.arrayIndexScale([B); ARRAY_SHORT_INDEX_SCALE = theUnsafe.arrayIndexScale([S); ARRAY_CHAR_INDEX_SCALE = theUnsafe.arrayIndexScale([C); ARRAY_INT_INDEX_SCALE = theUnsafe.arrayIndexScale([I); ARRAY_LONG_INDEX_SCALE = theUnsafe.arrayIndexScale([J); ARRAY_FLOAT_INDEX_SCALE = theUnsafe.arrayIndexScale([F); ARRAY_DOUBLE_INDEX_SCALE = theUnsafe.arrayIndexScale([D); ARRAY_OBJECT_INDEX_SCALE = theUnsafe.arrayIndexScale([Ljava/lang/Object;); ADDRESS_SIZE = theUnsafe.addressSize(); So... there is a (theoretical?) possibility that, say, byte[] is machine word-aligned I bet any RAM estimator written so far will be screwed if this happens
        Hide
        Uwe Schindler added a comment -

        So the whole Oops MBean magic is obsolete... ADDRESS_SIZE = theUnsafe.addressSize(); woooah, so simple - works on more platforms for guessing!

        I will check this out with the usual reflection magic

        Show
        Uwe Schindler added a comment - So the whole Oops MBean magic is obsolete... ADDRESS_SIZE = theUnsafe.addressSize(); woooah, so simple - works on more platforms for guessing! I will check this out with the usual reflection magic
        Hide
        Uwe Schindler added a comment -

        Hi,
        here new patch using Unsafe to get the bitness (with the well-known fallback) and for compressedOops detection. Looks much cleaner.
        I also like it more, that the addressSize is now detected natively and not from sysprops.

        The constants mentioned by Dawid are only availabe in Java 7, so i reflected the underlying methods from theUnsafe. I also changed the boolean JRE_USES_COMPRESSED_OOPS to an integer JRE_REFERENCE_SIZE that is used by RamUsageEstimator. We might do the same for all other native types... (this is just a start).

        Shai: Can you test with your JVMs and also enable/disable compressed oops/refs?

        Show
        Uwe Schindler added a comment - Hi, here new patch using Unsafe to get the bitness (with the well-known fallback) and for compressedOops detection. Looks much cleaner. I also like it more, that the addressSize is now detected natively and not from sysprops. The constants mentioned by Dawid are only availabe in Java 7, so i reflected the underlying methods from theUnsafe. I also changed the boolean JRE_USES_COMPRESSED_OOPS to an integer JRE_REFERENCE_SIZE that is used by RamUsageEstimator. We might do the same for all other native types... (this is just a start). Shai: Can you test with your JVMs and also enable/disable compressed oops/refs?
        Hide
        Shai Erera added a comment -

        Thanks Uwe !

        I ran the test, and now with both J9 (IBM) and Oracle, I get this print (without enabling any flag):

            [junit] NOTE: running test testReferenceSize
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 8
        
        • I modified the test name to testReferenceSize (was testCompressedOops).

        I wrote this small test to print the differences between sizeOf(String) and estimateRamUsage(String):

          public void testSizeOfString() throws Exception {
            String s = "abcdefgkjdfkdsjdskljfdskfjdsf";
            String sub = s.substring(0, 4);
            System.out.println("original=" + RamUsageEstimator.sizeOf(s));
            System.out.println("sub=" + RamUsageEstimator.sizeOf(sub));
            System.out.println("checkInterned=true(orig): " + new RamUsageEstimator().estimateRamUsage(s));
            System.out.println("checkInterned=false(orig): " + new RamUsageEstimator(false).estimateRamUsage(s));
            System.out.println("checkInterned=false(sub): " + new RamUsageEstimator(false).estimateRamUsage(sub));
          }
        

        It prints:

        original=104
        sub=56
        checkInterned=true(orig): 0
        checkInterned=false(orig): 98
        checkInterned=false(sub): 98
        

        So clearly estimateRamUsage factors in the sub-string's larger char[]. The difference in sizes of 'orig' stem from AverageGuessMemoryModel which computes the reference size to be 4 (hardcoded), and array size to be 16 (hardcoded). I modified AverageGuess to use constants from RUE (they are best guesses themselves). Still the test prints a difference, but now I think it's because sizeOf(String) aligns the size to mod 8, while estimateRamUsage isn't. I fixed that in size(Object), and now the prints are the same.

        • I also fixed sizeOfArray – if the array.length == 0, it returned 0, but it should return its header, and aligned to mod 8 as well.
        • I modified sizeOf(String[]) to sizeOf(Object[]) and compute its raw size only. I started to add sizeOf(String), fastSizeOf(String) and deepSizeOf(String[]), but reverted to avoid the hassle – the documentation confuses even me .
        • Changed all sizeOf() to return long, and align() to take and return long.

        I think this is ready to commit, though I'd appreciate a second look on the MemoryModel and size(Obj) changes.

        Also, how about renaming MemoryModel methods to: arrayHeaderSize(), classHeaderSize(), objReferenceSize() to make them more clear and accurate? For instance, getArraySize does not return the size of an array, but its object header ...

        Show
        Shai Erera added a comment - Thanks Uwe ! I ran the test, and now with both J9 (IBM) and Oracle, I get this print (without enabling any flag): [junit] NOTE: running test testReferenceSize [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 8 I modified the test name to testReferenceSize (was testCompressedOops). I wrote this small test to print the differences between sizeOf(String) and estimateRamUsage(String): public void testSizeOfString() throws Exception { String s = "abcdefgkjdfkdsjdskljfdskfjdsf" ; String sub = s.substring(0, 4); System .out.println( "original=" + RamUsageEstimator.sizeOf(s)); System .out.println( "sub=" + RamUsageEstimator.sizeOf(sub)); System .out.println( "checkInterned= true (orig): " + new RamUsageEstimator().estimateRamUsage(s)); System .out.println( "checkInterned= false (orig): " + new RamUsageEstimator( false ).estimateRamUsage(s)); System .out.println( "checkInterned= false (sub): " + new RamUsageEstimator( false ).estimateRamUsage(sub)); } It prints: original=104 sub=56 checkInterned= true (orig): 0 checkInterned= false (orig): 98 checkInterned= false (sub): 98 So clearly estimateRamUsage factors in the sub-string's larger char[]. The difference in sizes of 'orig' stem from AverageGuessMemoryModel which computes the reference size to be 4 (hardcoded), and array size to be 16 (hardcoded). I modified AverageGuess to use constants from RUE (they are best guesses themselves). Still the test prints a difference, but now I think it's because sizeOf(String) aligns the size to mod 8, while estimateRamUsage isn't. I fixed that in size(Object), and now the prints are the same. I also fixed sizeOfArray – if the array.length == 0, it returned 0, but it should return its header, and aligned to mod 8 as well. I modified sizeOf(String[]) to sizeOf(Object[]) and compute its raw size only. I started to add sizeOf(String), fastSizeOf(String) and deepSizeOf(String[]), but reverted to avoid the hassle – the documentation confuses even me . Changed all sizeOf() to return long, and align() to take and return long. I think this is ready to commit, though I'd appreciate a second look on the MemoryModel and size(Obj) changes. Also, how about renaming MemoryModel methods to: arrayHeaderSize(), classHeaderSize(), objReferenceSize() to make them more clear and accurate? For instance, getArraySize does not return the size of an array, but its object header ...
        Hide
        Dawid Weiss added a comment - - edited

        -1 to mixing shallow and deep sizeofs – sizeOf(Object[] arr) is shallow and just feels wrong to me. All the other methods yield the deep total, why make an exception? If anything, make it explicit and then do it for any type of object –

        shallowSizeOf(Object t);
        sizeOf(Object t);
        

        I'm not complaining just because my sense of taste is feeling bad. I am actually using this class in my own projects and I would hate to look into the JavaDoc every time to make sure what a given method does (especially with multiple overloads). In other words, I would hate to see this:

        Object [] o1 = new Object [] {1, 2, 3};
        Object o2 = o1;
        if (sizeOf(o1) != sizeOf(o2)) throw new WtfException();
        
        Show
        Dawid Weiss added a comment - - edited -1 to mixing shallow and deep sizeofs – sizeOf(Object[] arr) is shallow and just feels wrong to me. All the other methods yield the deep total, why make an exception? If anything, make it explicit and then do it for any type of object – shallowSizeOf( Object t); sizeOf( Object t); I'm not complaining just because my sense of taste is feeling bad. I am actually using this class in my own projects and I would hate to look into the JavaDoc every time to make sure what a given method does (especially with multiple overloads). In other words, I would hate to see this: Object [] o1 = new Object [] {1, 2, 3}; Object o2 = o1; if (sizeOf(o1) != sizeOf(o2)) throw new WtfException();
        Hide
        Uwe Schindler added a comment -

        I ran the test, and now with both J9 (IBM) and Oracle, I get this print (without enabling any flag):

            [junit] NOTE: running test testReferenceSize
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 8
        

        I hope with compressedOops explicitely enabled (or however they call them), you get a reference size of 4 in J9 and pre-1.6.0_23 Oracle?

        Show
        Uwe Schindler added a comment - I ran the test, and now with both J9 (IBM) and Oracle, I get this print (without enabling any flag): [junit] NOTE: running test testReferenceSize [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 8 I hope with compressedOops explicitely enabled (or however they call them), you get a reference size of 4 in J9 and pre-1.6.0_23 Oracle?
        Hide
        Shai Erera added a comment -

        Ok removed sizeOf(Object[]). One can compute it by using RUE.estimateRamSize to do a deep calculation.

        Geez Dawid, you took away all the reasons I originally opened the issue for .

        But at least AvgGuessMemoryModel and RUE.size() are more accurate now. And we have some useful utility methods.

        Show
        Shai Erera added a comment - Ok removed sizeOf(Object[]). One can compute it by using RUE.estimateRamSize to do a deep calculation. Geez Dawid, you took away all the reasons I originally opened the issue for . But at least AvgGuessMemoryModel and RUE.size() are more accurate now. And we have some useful utility methods.
        Hide
        Shai Erera added a comment -

        I ran "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:-UseCompressedOops" and get 8 and 4 (with CompressedOops).

        Show
        Shai Erera added a comment - I ran "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:-UseCompressedOops" and get 8 and 4 (with CompressedOops).
        Hide
        Mark Miller added a comment -

        Oh, bummer - looks like we lost the whole history of this class...such a bummer. I really wanted to take a look at how this class had evolved since I last looked at it. I've missed the conversations around the history loss - is that gone, gone, gone, or is there still some way to find it?

        Show
        Mark Miller added a comment - Oh, bummer - looks like we lost the whole history of this class...such a bummer. I really wanted to take a look at how this class had evolved since I last looked at it. I've missed the conversations around the history loss - is that gone, gone, gone, or is there still some way to find it?
        Hide
        Mark Miller added a comment -

        Scratch that - I was trying to look back from the apache git clone using git - assumed it's history matched svn - but I get a clean full history using svn.

        Show
        Mark Miller added a comment - Scratch that - I was trying to look back from the apache git clone using git - assumed it's history matched svn - but I get a clean full history using svn.
        Hide
        Uwe Schindler added a comment -

        Die, GIT, die! (as usual)

        Show
        Uwe Schindler added a comment - Die, GIT, die! (as usual)
        Hide
        Uwe Schindler added a comment -

        I ran "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:-UseCompressedOops" and get 8 and 4 (with CompressedOops).

        OK, thanks. So it seems to work at least with Oracle/Sun and IBM J9. I have no other updates to this detection code.

        Show
        Uwe Schindler added a comment - I ran "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:+UseCompressedOops" and "ant test-core -Dtestcase=TestRam* -Dtests.verbose=true -Dargs=-XX:-UseCompressedOops" and get 8 and 4 (with CompressedOops). OK, thanks. So it seems to work at least with Oracle/Sun and IBM J9. I have no other updates to this detection code.
        Hide
        Dawid Weiss added a comment -

        Geez Dawid, you took away all the reasons I originally opened the issue for

        This is by no means wasted time. I think the improvements are clear?

        Die, GIT, die!

        I disagree here – git is a great tool, even if the learning curve may be steep at first. git-svn is a whole different story (it's a great hack but just a hack).

        Show
        Dawid Weiss added a comment - Geez Dawid, you took away all the reasons I originally opened the issue for This is by no means wasted time. I think the improvements are clear? Die, GIT, die! I disagree here – git is a great tool, even if the learning curve may be steep at first. git-svn is a whole different story (it's a great hack but just a hack).
        Hide
        Uwe Schindler added a comment -

        I disagree here

        Calm down, was just my well-known standard answer

        Show
        Uwe Schindler added a comment - I disagree here Calm down, was just my well-known standard answer
        Hide
        Dawid Weiss added a comment -

        Oh, I am calm, I just know people do hate git (and I used to as well, until I started using it frequently). Robert has a strong opinion about git, for example.

        Besides, there's nothing wrong in having a strong opinion – it's great people can choose what they like and still collaborate via patches (and this seems to be the common ground between all vcs's).

        Show
        Dawid Weiss added a comment - Oh, I am calm, I just know people do hate git (and I used to as well, until I started using it frequently). Robert has a strong opinion about git, for example. Besides, there's nothing wrong in having a strong opinion – it's great people can choose what they like and still collaborate via patches (and this seems to be the common ground between all vcs's).
        Hide
        Shai Erera added a comment -

        This is by no means wasted time. I think the improvements are clear?

        Yes, yes. It was a joke.

        Ok so can I proceed with the commit, or does someone intend to review the patch later?

        Show
        Shai Erera added a comment - This is by no means wasted time. I think the improvements are clear? Yes, yes. It was a joke. Ok so can I proceed with the commit, or does someone intend to review the patch later?
        Hide
        Uwe Schindler added a comment -

        With unsafe we also get all those information like size of array header we have hardcoded. Should we not try to get these in the same way like I did for bitness and reference size - using Unsafe.theUnsafe.arrayBaseOffset()? And fallback to our hardcoded defaults?

        Show
        Uwe Schindler added a comment - With unsafe we also get all those information like size of array header we have hardcoded. Should we not try to get these in the same way like I did for bitness and reference size - using Unsafe.theUnsafe.arrayBaseOffset()? And fallback to our hardcoded defaults?
        Hide
        Dawid Weiss added a comment -

        using Unsafe.theUnsafe.arrayBaseOffset()? And fallback to our hardcoded defaults?

        +1.

        I will also try on OpenJDK with various jits but I'll do it in the evening.

        Yes, yes. It was a joke.

        Joke or no joke the truth is I did complain a lot.

        Show
        Dawid Weiss added a comment - using Unsafe.theUnsafe.arrayBaseOffset()? And fallback to our hardcoded defaults? +1. I will also try on OpenJDK with various jits but I'll do it in the evening. Yes, yes. It was a joke. Joke or no joke the truth is I did complain a lot.
        Hide
        Dawid Weiss added a comment -

        I just peeked at OpenJDK sources and addressSize() is defined as this:

        // See comment at file start about UNSAFE_LEAF
        //UNSAFE_LEAF(jint, Unsafe_AddressSize())
        UNSAFE_ENTRY(jint, Unsafe_AddressSize(JNIEnv *env, jobject unsafe))
          UnsafeWrapper("Unsafe_AddressSize");
          return sizeof(void*);
        UNSAFE_END
        

        In this light this switch:

        switch (addressSize) {
          case 4:
            is64Bit = Boolean.FALSE;
            break;
          case 8:
            is64Bit = Boolean.TRUE;
            break;
        }
        

        Becomes interesting. Do you know of any architecture with pointers different than 4 or 8 bytes?

        Show
        Dawid Weiss added a comment - I just peeked at OpenJDK sources and addressSize() is defined as this: // See comment at file start about UNSAFE_LEAF //UNSAFE_LEAF(jint, Unsafe_AddressSize()) UNSAFE_ENTRY(jint, Unsafe_AddressSize(JNIEnv *env, jobject unsafe)) UnsafeWrapper( "Unsafe_AddressSize" ); return sizeof(void*); UNSAFE_END In this light this switch: switch (addressSize) { case 4: is64Bit = Boolean .FALSE; break ; case 8: is64Bit = Boolean .TRUE; break ; } Becomes interesting. Do you know of any architecture with pointers different than 4 or 8 bytes?
        Hide
        Dawid Weiss added a comment -

        A few more exotic jits from OpenJDK (all seem to be using explicit 8 byte ref size on 64-bit:

        > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-jamvm"
            [junit] JVM: OpenJDK Runtime Environment, JamVM, Robert Lougher, 1.6.0-devel, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null,
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 8
        
        > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-jamvm -XX:+UseCompressedOops"
            [junit] JVM: OpenJDK Runtime Environment, JamVM, Robert Lougher, 1.6.0-devel, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null,
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 8
        
        > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-cacao"
            [junit] JVM: OpenJDK Runtime Environment, CACAO, CACAOVM - Verein zur Foerderung der freien virtuellen Maschine CACAO, 1.1.0pre2, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null,
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 8
        
        > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-server"
            [junit] JVM: OpenJDK Runtime Environment, OpenJDK 64-Bit Server VM, Sun Microsystems Inc., 20.0-b11, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null,
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 4
        
        > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-server -XX:-UseCompressedOops"
            [junit] JVM: OpenJDK Runtime Environment, OpenJDK 64-Bit Server VM, Sun Microsystems Inc., 20.0-b11, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null,
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 8
        
        Show
        Dawid Weiss added a comment - A few more exotic jits from OpenJDK (all seem to be using explicit 8 byte ref size on 64-bit: > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-jamvm" [junit] JVM: OpenJDK Runtime Environment, JamVM, Robert Lougher, 1.6.0-devel, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null, [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 8 > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-jamvm -XX:+UseCompressedOops" [junit] JVM: OpenJDK Runtime Environment, JamVM, Robert Lougher, 1.6.0-devel, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null, [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 8 > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-cacao" [junit] JVM: OpenJDK Runtime Environment, CACAO, CACAOVM - Verein zur Foerderung der freien virtuellen Maschine CACAO, 1.1.0pre2, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null, [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 8 > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-server" [junit] JVM: OpenJDK Runtime Environment, OpenJDK 64-Bit Server VM, Sun Microsystems Inc., 20.0-b11, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null, [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 4 > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-server -XX:-UseCompressedOops" [junit] JVM: OpenJDK Runtime Environment, OpenJDK 64-Bit Server VM, Sun Microsystems Inc., 20.0-b11, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_23, Sun Microsystems Inc., null, [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 8
        Hide
        Dawid Weiss added a comment -

        Mac:

        > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true
            [junit] JVM: Java(TM) SE Runtime Environment, Java HotSpot(TM) 64-Bit Server VM, Apple Inc., 20.4-b02-402, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_29, Apple Inc., null, 
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 4
        
        > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-server -XX:-UseCompressedOops"
            [junit] JVM: Java(TM) SE Runtime Environment, Java HotSpot(TM) 64-Bit Server VM, Apple Inc., 20.4-b02-402, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_29, Apple Inc., null, 
            [junit] NOTE: This JVM is 64bit: true
            [junit] NOTE: Reference size in this JVM: 8
        
        Show
        Dawid Weiss added a comment - Mac: > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true [junit] JVM: Java(TM) SE Runtime Environment, Java HotSpot(TM) 64-Bit Server VM, Apple Inc., 20.4-b02-402, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_29, Apple Inc., null, [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 4 > ant test-core -Dtestcase=TestRam* -Dtests.verbose=true "-Dargs=-server -XX:-UseCompressedOops" [junit] JVM: Java(TM) SE Runtime Environment, Java HotSpot(TM) 64-Bit Server VM, Apple Inc., 20.4-b02-402, Java Virtual Machine Specification, Sun Microsystems Inc., 1.6.0_29, Apple Inc., null, [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 8
        Hide
        Mark Miller added a comment -

        Nooo!!! My eyes!!!! I'm pretty sure my liver has just been virally licensed!

        Show
        Mark Miller added a comment - Nooo!!! My eyes!!!! I'm pretty sure my liver has just been virally licensed!
        Hide
        Dawid Weiss added a comment -

        Ok, right, sorry, let me scramble for intellectual property protection reasons:

        // See cemnmot at flie sratt abuot U_ANEESAFLF 
        /
        / ULAAFEN_SEF (jnit, UfdAsnerS_zsiaedse ())
        UEATERSNFN_Y (jint, UnidsdserSAasfe_ze (JNnEIv * env, jcbjeot unfsae))
        UesWrpfapaner (" UdenfsSseAazs_drie "); 
        rreutn seiozf (void *
        ;)
        UNEF_SNEAD
        
        Show
        Dawid Weiss added a comment - Ok, right, sorry, let me scramble for intellectual property protection reasons: // See cemnmot at flie sratt abuot U_ANEESAFLF / / ULAAFEN_SEF (jnit, UfdAsnerS_zsiaedse ()) UEATERSNFN_Y (jint, UnidsdserSAasfe_ze (JNnEIv * env, jcbjeot unfsae)) UesWrpfapaner (" UdenfsSseAazs_drie "); rreutn seiozf (void * ;) UNEF_SNEAD
        Hide
        Uwe Schindler added a comment -

        Becomes interesting. Do you know of any architecture with pointers different than 4 or 8 bytes?

        When I was writing that code, I was thinking a very long time about: Hm, should I add a "default" case saying:

        default:
          throw new Error("Lucene does not like architectures with pointer size " + addressSize)
        

        But then I decided: If there is an architecture with a pointer size of 6, does this break Lucene really? Hm, maybe I should have added a comment there:

        default:
          // this is the philosophical case of Lucene reaching an architecture returning something different here
        
        Show
        Uwe Schindler added a comment - Becomes interesting. Do you know of any architecture with pointers different than 4 or 8 bytes? When I was writing that code, I was thinking a very long time about: Hm, should I add a "default" case saying: default: throw new Error("Lucene does not like architectures with pointer size " + addressSize) But then I decided: If there is an architecture with a pointer size of 6, does this break Lucene really? Hm, maybe I should have added a comment there: default: // this is the philosophical case of Lucene reaching an architecture returning something different here
        Hide
        Uwe Schindler added a comment -

        Maybe this for @UweSays:

        default:
          throw new Error("Your processor(*) hit me with his " + addressSize + " inch dick");
          // (*)Dawid
        
        Show
        Uwe Schindler added a comment - Maybe this for @UweSays: default: throw new Error("Your processor(*) hit me with his " + addressSize + " inch dick"); // (*)Dawid
        Hide
        Dawid Weiss added a comment -

        I would throw an exception just so that we can hear about those architectures nobody has ever heard of

        Show
        Dawid Weiss added a comment - I would throw an exception just so that we can hear about those architectures nobody has ever heard of
        Hide
        Dawid Weiss added a comment -
        Show
        Dawid Weiss added a comment - fyi. http://en.wikipedia.org/wiki/48-bit
        Show
        Uwe Schindler added a comment - http://en.wikipedia.org/wiki/Quadruple_precision_floating-point_format
        Hide
        Dawid Weiss added a comment -

        Yep, but I'm talking about address registers and addressing in general. 48 bit addressing aligning would be inconvenient if you take into account that any index scaling addressing modes would have to do a shift and an addition (*3) instead of just a shift. Interesting stuff.

        Show
        Dawid Weiss added a comment - Yep, but I'm talking about address registers and addressing in general. 48 bit addressing aligning would be inconvenient if you take into account that any index scaling addressing modes would have to do a shift and an addition (*3) instead of just a shift. Interesting stuff.
        Hide
        Uwe Schindler added a comment -

        I agree, was just a joke. The comment before was more about suddenly appearing 128 bit architectures. That ones would have an addressSize of 16, still a power of 2

        I will now look into the unsafe array offsets...

        Show
        Uwe Schindler added a comment - I agree, was just a joke. The comment before was more about suddenly appearing 128 bit architectures. That ones would have an addressSize of 16, still a power of 2 I will now look into the unsafe array offsets...
        Hide
        Dawid Weiss added a comment - - edited

        Nice. All of a sudden you could enumerate all the atoms in the universe I love Wolfram Alpha...
        http://www.wolframalpha.com/input/?i=is+number+of+atoms+in+the+universe+greater+than+2%5E128%3F

        Show
        Dawid Weiss added a comment - - edited Nice. All of a sudden you could enumerate all the atoms in the universe I love Wolfram Alpha... http://www.wolframalpha.com/input/?i=is+number+of+atoms+in+the+universe+greater+than+2%5E128%3F
        Hide
        Uwe Schindler added a comment -

        I played around: Unsafe.arrayBaseOffset always returns 16 on my 64bit JVM, so it seems that NUM_BYTES_ARRAY_HEADER is wrong in our case (we have it as 12). It seems that the JVM aligns the array data to be multiple of 8 bytes on 64 bit machines?

        For normal objects, is there a way with unsafe to get the NUM_BYTES_OBJECT_HEADER?

        Show
        Uwe Schindler added a comment - I played around: Unsafe.arrayBaseOffset always returns 16 on my 64bit JVM, so it seems that NUM_BYTES_ARRAY_HEADER is wrong in our case (we have it as 12). It seems that the JVM aligns the array data to be multiple of 8 bytes on 64 bit machines? For normal objects, is there a way with unsafe to get the NUM_BYTES_OBJECT_HEADER?
        Hide
        Dawid Weiss added a comment -

        For normal objects, is there a way with unsafe to get the NUM_BYTES_OBJECT_HEADER?

        I don't know and I don't know if it varies between vendors. As for aligning – I bet this holds for anything, not only arrays. So fields of an object will be reordered and packed on their own boundary but entire themselves will be aligned on machine word boundaries for efficiency. Did you try running with Instrumentation (an agent)? What does it say about object/ array sizes?

        Show
        Dawid Weiss added a comment - For normal objects, is there a way with unsafe to get the NUM_BYTES_OBJECT_HEADER? I don't know and I don't know if it varies between vendors. As for aligning – I bet this holds for anything, not only arrays. So fields of an object will be reordered and packed on their own boundary but entire themselves will be aligned on machine word boundaries for efficiency. Did you try running with Instrumentation (an agent)? What does it say about object/ array sizes?
        Hide
        Uwe Schindler added a comment -

        Interestingly the ARRAY header seems to be much bigger on 64 bit platforms without compact refs, so I have the feeling that somehow thre is still some space needed for an object ref, so the original definition of the size was more correct? https://gist.github.com/2038305

        Using the original definition:

        public final static int NUM_BYTES_ARRAY_HEADER = NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF;
        

        This looks much more like the above size, aligned to 8 bytes.

        Did you try running with Instrumentation (an agent)? What does it say about object/ array sizes?

        Have to try out and set this up first.

        Show
        Uwe Schindler added a comment - Interestingly the ARRAY header seems to be much bigger on 64 bit platforms without compact refs, so I have the feeling that somehow thre is still some space needed for an object ref, so the original definition of the size was more correct? https://gist.github.com/2038305 Using the original definition: public final static int NUM_BYTES_ARRAY_HEADER = NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF; This looks much more like the above size, aligned to 8 bytes. Did you try running with Instrumentation (an agent)? What does it say about object/ array sizes? Have to try out and set this up first.
        Hide
        Uwe Schindler added a comment -

        This is also in line with this instumentation page:
        http://www.javaspecialists.eu/archive/Issue142.html

        Which prints:

        measureSize(new byte[1000]);
        byte[], shallow=1016, deep=1016
        measureSize(new boolean[1000]);
        boolean[], shallow=1016, deep=1016
        
        Show
        Uwe Schindler added a comment - This is also in line with this instumentation page: http://www.javaspecialists.eu/archive/Issue142.html Which prints: measureSize(new byte[1000]); byte[], shallow=1016, deep=1016 measureSize(new boolean[1000]); boolean[], shallow=1016, deep=1016
        Hide
        Dawid Weiss added a comment -

        Array header is still 12 bytes but it is aligned to the next multiple-8 boundary? Looks like it.

        Show
        Dawid Weiss added a comment - Array header is still 12 bytes but it is aligned to the next multiple-8 boundary? Looks like it.
        Hide
        Uwe Schindler added a comment -

        But how does that explain that with non-compact refs the arrayBaseOffset is 24?

        Show
        Uwe Schindler added a comment - But how does that explain that with non-compact refs the arrayBaseOffset is 24?
        Hide
        Dawid Weiss added a comment -

        Can you check what size does Object[] report vs. for example Integer[]? I think the difference may be because typed arrays need to know the type of their component.

        Show
        Dawid Weiss added a comment - Can you check what size does Object[] report vs. for example Integer[]? I think the difference may be because typed arrays need to know the type of their component.
        Hide
        Dawid Weiss added a comment -

        We peeked at the forbidden a bit again. The difference 12 vs. 16 bytes is a result of how ordinary object pointers (OOPs) are defined – they are a combination of object header information (oopMark) and class pointer. The class pointer is a compile time union of either a regular pointer or a compact pointer. oopMark is either 4 bytes (32 bit jvms) or 8 bytes (64 bit jvms). So:

        64 bit jvm, full oops: 8 + 8 = 16
        64 bit jvm, compact oops: 8 + 4 = 12
        32 bit jvm: 4 + 4 = 8

        Show
        Dawid Weiss added a comment - We peeked at the forbidden a bit again. The difference 12 vs. 16 bytes is a result of how ordinary object pointers (OOPs) are defined – they are a combination of object header information (oopMark) and class pointer. The class pointer is a compile time union of either a regular pointer or a compact pointer. oopMark is either 4 bytes (32 bit jvms) or 8 bytes (64 bit jvms). So: 64 bit jvm, full oops: 8 + 8 = 16 64 bit jvm, compact oops: 8 + 4 = 12 32 bit jvm: 4 + 4 = 8
        Hide
        Uwe Schindler added a comment -

        With the help of Dawid (inspecting forbidden C code g), we checked the actual size and how they are calculated. Based on that I changed the defaults depending on bitness (Object header is 16 on 64 bit without compact refs, array header is 24 on 64 bit).

        The attached patch will use the above defaults, but tries to update them using sun.misc.Unsafe. The trick to get the object header from usafe is by declaring a dummy class extending Object with one single field. We are then using unsafe to get the fieldOffset of that field. As Dawid pointed out, the return value is identical to his investigations (8 bytes on 32 bit archs, 16 bytes on 64 bit archs and 12 bytes on compact ref 64bit archs). So RamUsageEstimator was completely wrong in the past for 64 bit architectures.

        I also changed the funny switch statement ("the " + adressSize + " inch dick") to assume 64 bits architecture, if addressSize >= 8.

        Show
        Uwe Schindler added a comment - With the help of Dawid (inspecting forbidden C code g ), we checked the actual size and how they are calculated. Based on that I changed the defaults depending on bitness (Object header is 16 on 64 bit without compact refs, array header is 24 on 64 bit). The attached patch will use the above defaults, but tries to update them using sun.misc.Unsafe. The trick to get the object header from usafe is by declaring a dummy class extending Object with one single field. We are then using unsafe to get the fieldOffset of that field. As Dawid pointed out, the return value is identical to his investigations (8 bytes on 32 bit archs, 16 bytes on 64 bit archs and 12 bytes on compact ref 64bit archs). So RamUsageEstimator was completely wrong in the past for 64 bit architectures. I also changed the funny switch statement ("the " + adressSize + " inch dick") to assume 64 bits architecture, if addressSize >= 8.
        Hide
        Uwe Schindler added a comment -

        I would like to remove the AverageBlabla memotry model. The code inside is simply no longer useful. RamUsageEstimator simply uses the sizes returned by the JVM.

        Show
        Uwe Schindler added a comment - I would like to remove the AverageBlabla memotry model. The code inside is simply no longer useful. RamUsageEstimator simply uses the sizes returned by the JVM.
        Hide
        Uwe Schindler added a comment -

        Updated patch with the abstract and now useless MemoryModel removed.

        Show
        Uwe Schindler added a comment - Updated patch with the abstract and now useless MemoryModel removed.
        Hide
        Uwe Schindler added a comment -

        Robert reminded me that there is also a heavily broken custom memory estimator in MemoryIndex, too. I will look into it, too.

        Show
        Uwe Schindler added a comment - Robert reminded me that there is also a heavily broken custom memory estimator in MemoryIndex, too. I will look into it, too.
        Hide
        Uwe Schindler added a comment -

        Attached is a patch fixing several bugs and more:

        • Removed the MemoryIndex VM class and the completely outdated and incorrect estimation there.
        • Used Shai's new added methods also in Lucene's PackedInt classes
        • Fixes overflows in Shai's new methods, as they can overflow if arrays are greater than 2 GB (casts to long missing)
        • Fixed the up-rounding to multiples of 8 to work with longs

        What's the reason why this rounding up to 8 bytes was added? I assume this information comes from somewhere, but it was added by Shai without any explanation. Is this not also dependent on the 64bitness if its 8 or 4?

        Otherwise patch is ready.

        Show
        Uwe Schindler added a comment - Attached is a patch fixing several bugs and more: Removed the MemoryIndex VM class and the completely outdated and incorrect estimation there. Used Shai's new added methods also in Lucene's PackedInt classes Fixes overflows in Shai's new methods, as they can overflow if arrays are greater than 2 GB (casts to long missing) Fixed the up-rounding to multiples of 8 to work with longs What's the reason why this rounding up to 8 bytes was added? I assume this information comes from somewhere, but it was added by Shai without any explanation. Is this not also dependent on the 64bitness if its 8 or 4? Otherwise patch is ready.
        Hide
        Robert Muir added a comment -

        Removed the MemoryIndex VM class and the completely outdated and incorrect estimation there.

        thank you!!!

        Show
        Robert Muir added a comment - Removed the MemoryIndex VM class and the completely outdated and incorrect estimation there. thank you!!!
        Hide
        Dawid Weiss added a comment -

        Awesome job, Uwe. I think I wasn't right about that alignment of arrays – sizeof(int) should't come up to 8. I will look into this again in the evening, it got me interested. I'll also check out the alignments, so if this patch can wait until tomorrow then we'll be more confident we get the estimates right.

        Show
        Dawid Weiss added a comment - Awesome job, Uwe. I think I wasn't right about that alignment of arrays – sizeof(int) should't come up to 8. I will look into this again in the evening, it got me interested. I'll also check out the alignments, so if this patch can wait until tomorrow then we'll be more confident we get the estimates right.
        Hide
        Dawid Weiss added a comment -

        This is very interesting indeed.

        So, I used the agent hook into a running VM to dump some of the internal diagnostics, including OOP sizes, heap word alignments, etc. Here's a scoop of the results (with client-side indicated sizes on the right):

        # 1.7, 64 bit, OOPS compressed            (client)
        getOopSize: 8                             ref size = 4         
        Address size: 8                           array header = 16    
        Bytes per long: 8                         object header = 12   
        CPU: amd64
        HeapOopSize: 4
        HeapWordSize: 8
        IntSize: 4
        getMinObjAlignmentInBytes: 8
        getObjectAlignmentInBytes: 8
        isCompressedOopsEnabled: true
        isLP64: true
        
        
        # 1.7, 64 bit, full
        getOopSize: 8                             ref size = 8     
        Address size: 8                           array header = 24
        Bytes per long: 8                         object header = 16
        CPU: amd64
        HeapOopSize: 8
        HeapWordSize: 8
        IntSize: 4
        getMinObjAlignmentInBytes: 8
        getObjectAlignmentInBytes: 8
        isCompressedOopsEnabled: false
        isLP64: true
        
        # 1.7, 32 bit  
        getOopSize: 4                             ref size = 4     
        Address size: 4                           array header = 12
        Bytes per long: 8                         object header = 8
        CPU: x86
        HeapOopSize: 4
        HeapWordSize: 4
        IntSize: 4
        getMinObjAlignmentInBytes: 8
        getObjectAlignmentInBytes: 8
        isCompressedOopsEnabled: false
        isLP64: false
        

        The question we asked ourselves with Uwe is why an empty array takes 24 bytes without OOP compression (that's object overhead and an int length, so should be 16 + 4 = 20)? The answer seems to be in how base offsets are calculated for arrays – they seem to be enforced on HeapWordSize boundary and this is 8, even with OOP compressed:

          // Returns the offset of the first element.
          static int base_offset_in_bytes(BasicType type) {
            return header_size(type) * HeapWordSize;
          }
        

        I'll spare you the detailed code but the rounding to next HeapWordSize multiple seems evident in all cases. What's even more interesting, this "wasted" space is not (and cannot) be used for data so even a single integer pushes the array size to the next available bound:

        int[0] = 24
        int[1] = 32   (*)
        int[2] = 32
        int[3] = 40
        

        Finally, I could not resist to mention that object alignments... are adjustable, at least to 2^n boundaries. So you can also do this:

        > java  -XX:-UseCompressedOops -XX:ObjectAlignmentInBytes=32 ...
        Object = 32
        int[0] = 32
        int[1] = 32
        int[2] = 32
        int[3] = 64
        

        Nice, huh? I don't think the JVM has been tested heavily for this possibility though because the code hung on me a few times if executed in that mode.

        Show
        Dawid Weiss added a comment - This is very interesting indeed. So, I used the agent hook into a running VM to dump some of the internal diagnostics, including OOP sizes, heap word alignments, etc. Here's a scoop of the results (with client-side indicated sizes on the right): # 1.7, 64 bit, OOPS compressed (client) getOopSize: 8 ref size = 4 Address size: 8 array header = 16 Bytes per long: 8 object header = 12 CPU: amd64 HeapOopSize: 4 HeapWordSize: 8 IntSize: 4 getMinObjAlignmentInBytes: 8 getObjectAlignmentInBytes: 8 isCompressedOopsEnabled: true isLP64: true # 1.7, 64 bit, full getOopSize: 8 ref size = 8 Address size: 8 array header = 24 Bytes per long: 8 object header = 16 CPU: amd64 HeapOopSize: 8 HeapWordSize: 8 IntSize: 4 getMinObjAlignmentInBytes: 8 getObjectAlignmentInBytes: 8 isCompressedOopsEnabled: false isLP64: true # 1.7, 32 bit getOopSize: 4 ref size = 4 Address size: 4 array header = 12 Bytes per long: 8 object header = 8 CPU: x86 HeapOopSize: 4 HeapWordSize: 4 IntSize: 4 getMinObjAlignmentInBytes: 8 getObjectAlignmentInBytes: 8 isCompressedOopsEnabled: false isLP64: false The question we asked ourselves with Uwe is why an empty array takes 24 bytes without OOP compression (that's object overhead and an int length, so should be 16 + 4 = 20)? The answer seems to be in how base offsets are calculated for arrays – they seem to be enforced on HeapWordSize boundary and this is 8, even with OOP compressed: // Returns the offset of the first element. static int base_offset_in_bytes(BasicType type) { return header_size(type) * HeapWordSize; } I'll spare you the detailed code but the rounding to next HeapWordSize multiple seems evident in all cases. What's even more interesting, this "wasted" space is not (and cannot) be used for data so even a single integer pushes the array size to the next available bound: int[0] = 24 int[1] = 32 (*) int[2] = 32 int[3] = 40 Finally, I could not resist to mention that object alignments... are adjustable, at least to 2^n boundaries. So you can also do this: > java -XX:-UseCompressedOops -XX:ObjectAlignmentInBytes=32 ... Object = 32 int[0] = 32 int[1] = 32 int[2] = 32 int[3] = 64 Nice, huh? I don't think the JVM has been tested heavily for this possibility though because the code hung on me a few times if executed in that mode.
        Hide
        Dawid Weiss added a comment -

        He, he, he... this is fun, haven't been playing with Unsafe for a while and forgot how enjoyable this can be.

                    Unsafe us = ...;
                    byte [] dummy  = {0x11, 0x22, 0x33, 0x44};
                    int []  dummy2 = {0}; // match length above.
                    // Change the class of dummy to int[]...
                    int klazz = us.getInt(dummy2, 8);
                                us.putInt( dummy, 8, klazz);
                    // this will be ok.
                    dummy2 = (int[])(Object) dummy;
                    // and we can run native int accessors on a byte[] now...
                    System.out.println("> " + Integer.toHexString(dummy2[0]));
        
        Show
        Dawid Weiss added a comment - He, he, he... this is fun, haven't been playing with Unsafe for a while and forgot how enjoyable this can be. Unsafe us = ...; byte [] dummy = {0x11, 0x22, 0x33, 0x44}; int [] dummy2 = {0}; // match length above. // Change the class of dummy to int []... int klazz = us.getInt(dummy2, 8); us.putInt( dummy, 8, klazz); // this will be ok. dummy2 = ( int [])( Object ) dummy; // and we can run native int accessors on a byte [] now... System .out.println( "> " + Integer .toHexString(dummy2[0]));
        Hide
        Dawid Weiss added a comment -

        I think Yonik once mentioned he wanted a fast hash over byte[] – this could be it (temporarily cast to a long[] and then revert after computations are over). Go for it, Yonik

        Show
        Dawid Weiss added a comment - I think Yonik once mentioned he wanted a fast hash over byte[] – this could be it (temporarily cast to a long[] and then revert after computations are over). Go for it, Yonik
        Hide
        Uwe Schindler added a comment -

        Thanks for investigation. The 8 byte object size multiplier is fixed, so the round-up method is fine.

        I have been thinking about alignment things. Its a good possibility to get the object size by suming up the field sizes, but it can even be done better.

        If unsafe is available and useable, we can simply get the object size (including all headers), by finding the Math.max(field offset + field type length). So the object size is the offset of the last field (with biggest offset) + its size. This value is finally rounded up to multiples of 8.

        The attached patch does this.

        Show
        Uwe Schindler added a comment - Thanks for investigation. The 8 byte object size multiplier is fixed, so the round-up method is fine. I have been thinking about alignment things. Its a good possibility to get the object size by suming up the field sizes, but it can even be done better. If unsafe is available and useable, we can simply get the object size (including all headers), by finding the Math.max(field offset + field type length). So the object size is the offset of the last field (with biggest offset) + its size. This value is finally rounded up to multiples of 8. The attached patch does this.
        Hide
        Dawid Weiss added a comment -

        The 8 byte object size multiplier is fixed, so the round-up method is fine.

        I don't think it's "fixed" – see the -XX:ObjectAlignmentInBytes=32 above. But the defaults seem to be the same on all systems.

        Show
        Dawid Weiss added a comment - The 8 byte object size multiplier is fixed, so the round-up method is fine. I don't think it's "fixed" – see the -XX:ObjectAlignmentInBytes=32 above. But the defaults seem to be the same on all systems.
        Hide
        Uwe Schindler added a comment -

        I don't think it's "fixed" – see the -XX:ObjectAlignmentInBytes=32 above. But the defaults seem to be the same on all systems.

        I would like to have the rounding also dynamic, but this is not possible to find out with Unsafe, at least for this I have no idea

        Show
        Uwe Schindler added a comment - I don't think it's "fixed" – see the -XX:ObjectAlignmentInBytes=32 above. But the defaults seem to be the same on all systems. I would like to have the rounding also dynamic, but this is not possible to find out with Unsafe, at least for this I have no idea
        Hide
        Dawid Weiss added a comment -

        at least for this I have no idea

        The management factory trick mentioned by Kris works for object alignment as well:

        package spikes;
        
        import java.io.IOException;
        import java.lang.management.ManagementFactory;
        import java.lang.reflect.Method;
        import java.util.List;
        
        import com.sun.management.HotSpotDiagnosticMXBean;
        import com.sun.management.VMOption;
        
        public class ObAlignment
        {
            private static final String HOTSPOT_BEAN_NAME = "com.sun.management:type=HotSpotDiagnostic";
            private static HotSpotDiagnosticMXBean hotspotMBean;
            
            private static HotSpotDiagnosticMXBean getHotSpotMBean() {
              if (hotspotMBean == null) {
                try {
                  hotspotMBean = ManagementFactory.newPlatformMXBeanProxy(
                    ManagementFactory.getPlatformMBeanServer(),
                    HOTSPOT_BEAN_NAME,
                    HotSpotDiagnosticMXBean.class);
                } catch (IOException e) {
                  e.printStackTrace();
                }
              }
              return hotspotMBean;
            }
        
            public static void main(String [] args)
                throws Exception
            {
                // Just the object alignment.
                System.out.println(getHotSpotMBean().getVMOption("ObjectAlignmentInBytes"));
        
                // Everything.
                Class<?> fc = Class.forName("sun.management.Flag");
                System.out.println(fc);
                Method m = fc.getDeclaredMethod("getAllFlags");
                m.setAccessible(true);
                List<Object> flags = (List<Object>) m.invoke(null);
                for (Object f : flags) {
                    Method dm = f.getClass().getDeclaredMethod("getVMOption");
                    dm.setAccessible(true);
                    VMOption option = (VMOption) dm.invoke(f);
                    System.out.println(option);
                }
            }
        }
        

        I don't think it is of much practical use for now (object alignment seems to be constant everywhere), but we could as well probe it – if it's available why not use it.

        I'd also like to add a shallow size method (which wouldn't follow the fields, just return the aligned object size). I'll be able to work on it in the evening though, not sooner.

        Show
        Dawid Weiss added a comment - at least for this I have no idea The management factory trick mentioned by Kris works for object alignment as well: package spikes; import java.io.IOException; import java.lang.management.ManagementFactory; import java.lang.reflect.Method; import java.util.List; import com.sun.management.HotSpotDiagnosticMXBean; import com.sun.management.VMOption; public class ObAlignment { private static final String HOTSPOT_BEAN_NAME = "com.sun.management:type=HotSpotDiagnostic" ; private static HotSpotDiagnosticMXBean hotspotMBean; private static HotSpotDiagnosticMXBean getHotSpotMBean() { if (hotspotMBean == null ) { try { hotspotMBean = ManagementFactory.newPlatformMXBeanProxy( ManagementFactory.getPlatformMBeanServer(), HOTSPOT_BEAN_NAME, HotSpotDiagnosticMXBean.class); } catch (IOException e) { e.printStackTrace(); } } return hotspotMBean; } public static void main( String [] args) throws Exception { // Just the object alignment. System .out.println(getHotSpotMBean().getVMOption( "ObjectAlignmentInBytes" )); // Everything. Class <?> fc = Class .forName( "sun.management.Flag" ); System .out.println(fc); Method m = fc.getDeclaredMethod( "getAllFlags" ); m.setAccessible( true ); List< Object > flags = (List< Object >) m.invoke( null ); for ( Object f : flags) { Method dm = f.getClass().getDeclaredMethod( "getVMOption" ); dm.setAccessible( true ); VMOption option = (VMOption) dm.invoke(f); System .out.println(option); } } } I don't think it is of much practical use for now (object alignment seems to be constant everywhere), but we could as well probe it – if it's available why not use it. I'd also like to add a shallow size method (which wouldn't follow the fields, just return the aligned object size). I'll be able to work on it in the evening though, not sooner.
        Hide
        Uwe Schindler added a comment -

        Too funny,
        I had the same idea while at breakfast and started to implement it when you were writing your comment

        I will post patch soon (also with other improvements)!

        Show
        Uwe Schindler added a comment - Too funny, I had the same idea while at breakfast and started to implement it when you were writing your comment I will post patch soon (also with other improvements)!
        Hide
        Uwe Schindler added a comment -

        I will add a shallow parameter to the estimate method, we just dont have to dig into, so it's a simple if check.

        Show
        Uwe Schindler added a comment - I will add a shallow parameter to the estimate method, we just dont have to dig into, so it's a simple if check.
        Hide
        Uwe Schindler added a comment -

        New patch:

        • retrieve object alignment (default 8, e.g. 32bit JVMs don't report it)
        • add shallow object size measurement
        • add some security checks to possibly handle the "cookie" warning in Unsafe.objectFieldOffset() (the offsets may be "scaled"). Current JVMs never do this, but the documentation explicitely states that the offsets may not be byte-aligned.
        Show
        Uwe Schindler added a comment - New patch: retrieve object alignment (default 8, e.g. 32bit JVMs don't report it) add shallow object size measurement add some security checks to possibly handle the "cookie" warning in Unsafe.objectFieldOffset() (the offsets may be "scaled"). Current JVMs never do this, but the documentation explicitely states that the offsets may not be byte-aligned.
        Hide
        Uwe Schindler added a comment -

        Minor improvements.

        Show
        Uwe Schindler added a comment - Minor improvements.
        Hide
        Dawid Weiss added a comment -

        The patch looks good. I don't know if decorating IdentityHashMap to be a set adds any overhead... I was also thinking about doing a custom set impl. for this so that we know how much memory we allocate during the checking itself, but it seems to be very specific to what I need, so no worries.

        One thing:

        (size % NUM_BYTES_OBJECT_ALIGNMENT);
        

        Byte alignment will be a power of 2 (that option to change it even enforces it when you start the JVM) so you can do a bitmask instead of modulo - should be slighly faster.

        Show
        Dawid Weiss added a comment - The patch looks good. I don't know if decorating IdentityHashMap to be a set adds any overhead... I was also thinking about doing a custom set impl. for this so that we know how much memory we allocate during the checking itself, but it seems to be very specific to what I need, so no worries. One thing: (size % NUM_BYTES_OBJECT_ALIGNMENT); Byte alignment will be a power of 2 (that option to change it even enforces it when you start the JVM) so you can do a bitmask instead of modulo - should be slighly faster.
        Hide
        Uwe Schindler added a comment -

        I separated the shallow Object inspection to a static method, which is more cheap (no RamUsageEstimator instance is needed). The static method now only takes a Class<?> parameter and returns the size (an instance is not even needed).

        I also added a diagnostic boolean, so you can query RamUsageEstimator, if the used JVM is supported (supports Hotspot diagnostics, sum.misc.Unsafe). If that is not the case, our testcase will print a warning so users cam report back (if they run the tests).

        I think this is ready to commit.

        Show
        Uwe Schindler added a comment - I separated the shallow Object inspection to a static method, which is more cheap (no RamUsageEstimator instance is needed). The static method now only takes a Class<?> parameter and returns the size (an instance is not even needed). I also added a diagnostic boolean, so you can query RamUsageEstimator, if the used JVM is supported (supports Hotspot diagnostics, sum.misc.Unsafe). If that is not the case, our testcase will print a warning so users cam report back (if they run the tests). I think this is ready to commit.
        Hide
        Uwe Schindler added a comment -

        One thing:

        (size % NUM_BYTES_OBJECT_ALIGNMENT);
        

        Byte alignment will be a power of 2 (that option to change it even enforces it when you start the JVM) so you can do a bitmask instead of modulo - should be slighly faster.

        I don't think thats really needed here Speed is limited by reflection in most cases and this one calculation should not matter. Also the number is not reported back as power of 2, so I have to calc the log2 first (ok, ntz &co.), but I don't think we should actually limit that to powers of 2. Maybe another vendor has the ultimate answer of 42 for his objects?

        Show
        Uwe Schindler added a comment - One thing: (size % NUM_BYTES_OBJECT_ALIGNMENT); Byte alignment will be a power of 2 (that option to change it even enforces it when you start the JVM) so you can do a bitmask instead of modulo - should be slighly faster. I don't think thats really needed here Speed is limited by reflection in most cases and this one calculation should not matter. Also the number is not reported back as power of 2, so I have to calc the log2 first (ok, ntz &co.), but I don't think we should actually limit that to powers of 2. Maybe another vendor has the ultimate answer of 42 for his objects?
        Hide
        Uwe Schindler added a comment -

        I don't know if decorating IdentityHashMap to be a set adds any overhead

        The whole problem is more that it might happen that the IdentityHashMap takes horrible amounts of memory while inspecting (think of a boxed numbers array like Byte[50000]). Speed is not important, reflection is slow

        I have no better idea about how to detect duplicates, unfortunately. The old trick from Arrays.deepEquals() to stop when the parameter itsself is seen again, is not enough here.

        Show
        Uwe Schindler added a comment - I don't know if decorating IdentityHashMap to be a set adds any overhead The whole problem is more that it might happen that the IdentityHashMap takes horrible amounts of memory while inspecting (think of a boxed numbers array like Byte [50000] ). Speed is not important, reflection is slow I have no better idea about how to detect duplicates, unfortunately. The old trick from Arrays.deepEquals() to stop when the parameter itsself is seen again, is not enough here.
        Hide
        Uwe Schindler added a comment -

        One more improvement:
        The shallow Class inspection can ignore superclasses, if Unsafe is in use. As additional fields are always added at the end (otherwise casting of classes and later field access would not work inside the JVM), to find the maximum field offset we don't need to go to superclasses.

        I want to commit and backport this to 3.x during the weekend.

        Show
        Uwe Schindler added a comment - One more improvement: The shallow Class inspection can ignore superclasses, if Unsafe is in use. As additional fields are always added at the end (otherwise casting of classes and later field access would not work inside the JVM), to find the maximum field offset we don't need to go to superclasses. I want to commit and backport this to 3.x during the weekend.
        Hide
        Dawid Weiss added a comment -

        Reflection don't need to cost you much if you make a cache along the way. Retrieving an object's class is virtually zero cost so this would make it very efficient and the number of classes in the system is much smaller than the number of objects so it shouldn't be a problem.

        to find the maximum field offset we don't need to go to superclasses.

        I can't imagine a situation where this wouldn't be the case although an assertion here would be nice just to make sure we're not assuming something that isn't true.

        I will take a closer look at the patch again this evening and do some testing/ API flexibility based on what I have in my project. Will report on the results.

        Show
        Dawid Weiss added a comment - Reflection don't need to cost you much if you make a cache along the way. Retrieving an object's class is virtually zero cost so this would make it very efficient and the number of classes in the system is much smaller than the number of objects so it shouldn't be a problem. to find the maximum field offset we don't need to go to superclasses. I can't imagine a situation where this wouldn't be the case although an assertion here would be nice just to make sure we're not assuming something that isn't true. I will take a closer look at the patch again this evening and do some testing/ API flexibility based on what I have in my project. Will report on the results.
        Hide
        Uwe Schindler added a comment -

        Reflection don't need to cost you much if you make a cache along the way. Retrieving an object's class is virtually zero cost so this would make it very efficient and the number of classes in the system is much smaller than the number of objects so it shouldn't be a problem.

        Would be like the reflection cache in AttributeSource But yes I was also thinking about a second IdentityHashMap<Class<?>,Long> along the way.

        I can't imagine a situation where this wouldn't be the case although an assertion here would be nice just to make sure we're not assuming something that isn't true.

        Thats already checked in the test, who has 2 subclasses, one with no additional fields (size must be identical) and one with 2 more fields (should be >=).

        Show
        Uwe Schindler added a comment - Reflection don't need to cost you much if you make a cache along the way. Retrieving an object's class is virtually zero cost so this would make it very efficient and the number of classes in the system is much smaller than the number of objects so it shouldn't be a problem. Would be like the reflection cache in AttributeSource But yes I was also thinking about a second IdentityHashMap<Class<?>,Long> along the way. I can't imagine a situation where this wouldn't be the case although an assertion here would be nice just to make sure we're not assuming something that isn't true. Thats already checked in the test, who has 2 subclasses, one with no additional fields (size must be identical) and one with 2 more fields (should be >=).
        Hide
        Shai Erera added a comment -

        Wow, what awesome improvements you guys have added !

        Uwe, +1 to commit. I unassigned myself - you and Dawid definitely deserve the credit!

        Show
        Shai Erera added a comment - Wow, what awesome improvements you guys have added ! Uwe, +1 to commit. I unassigned myself - you and Dawid definitely deserve the credit!
        Hide
        Dawid Weiss added a comment -

        Modified method naming convention: any sizeOf is "deep", shallowSizeOf* is "shallow". Methods in RUE are now static; didn't hide the constructor though (maybe we should?).

        More comments in a minute.

        Show
        Dawid Weiss added a comment - Modified method naming convention: any sizeOf is "deep", shallowSizeOf* is "shallow". Methods in RUE are now static; didn't hide the constructor though (maybe we should?). More comments in a minute.
        Hide
        Uwe Schindler added a comment -

        Thanks for cleanup!

        didn't hide the constructor though (maybe we should?).

        We must. Class is final and has no instance methods -> useless to have ctor. Also as previous versions in 3.x allowed instances, we should prevent this to fix incorrect usage.

        Show
        Uwe Schindler added a comment - Thanks for cleanup! didn't hide the constructor though (maybe we should?). We must. Class is final and has no instance methods -> useless to have ctor. Also as previous versions in 3.x allowed instances, we should prevent this to fix incorrect usage.
        Hide
        Dawid Weiss added a comment -

        I've played with the code a bit and I've been trying to figure out a way to determine empirically "how far off" is the estimation from real life usage. It's not easy because RUE itself allocates memory (and not small quantities in case of complex object graphs!). I left these experiments in StressRamUsageEstimator; it is a test case – maybe we should add @Ignore and rename it to Test*, don't know.

        Anyway, the allocation seems to be measured pretty accurately. When tlabs are disabled this is a result of allocating small byte arrays for example:

         committed           max        estimated(allocation)
              2 MB	   48.4 MB	  16 bytes
            1.7 MB	   48.4 MB	  262.4 KB
              2 MB	   48.4 MB	  524.6 KB
            2.2 MB	   48.4 MB	    787 KB
            2.5 MB	   48.4 MB	      1 MB
            2.7 MB	   48.4 MB	    1.3 MB
              3 MB	   48.4 MB	    1.5 MB
            3.3 MB	   48.4 MB	    1.8 MB
        ....
           46.9 MB	   48.4 MB	   45.6 MB
           47.1 MB	   48.4 MB	   45.9 MB
           47.4 MB	   48.4 MB	   46.1 MB
           47.6 MB	   48.4 MB	   46.4 MB
           47.9 MB	   48.4 MB	   46.6 MB
           48.1 MB	   48.4 MB	   46.9 MB
        

        So it's fairly ideal (committed memory is all committed memory so I assume additional data structures, classes, etc. also count in).

        Unfortunately it's not always so smooth, for example jrockit's mx beans seem not to return the actual memory allocation state (and if they do, I don't understand it):

         committed           max        estimated(allocation)
           29.4 MB	     50 MB	  16 bytes
           29.8 MB	     50 MB	  262.5 KB
           30.2 MB	     50 MB	  524.9 KB
           30.4 MB	     50 MB	  787.3 KB
           30.8 MB	     50 MB	      1 MB
           31.1 MB	     50 MB	    1.3 MB
           31.4 MB	     50 MB	    1.5 MB
           31.7 MB	     50 MB	    1.8 MB
             32 MB	     50 MB	      2 MB
           32.4 MB	     50 MB	    2.3 MB
           32.7 MB	     50 MB	    2.6 MB
           33.1 MB	     50 MB	    2.8 MB
           33.5 MB	     50 MB	    3.1 MB
           33.8 MB	     50 MB	    3.3 MB
           34.2 MB	     50 MB	    3.6 MB
           34.5 MB	     50 MB	    3.8 MB
           34.8 MB	     50 MB	    4.1 MB
           35.2 MB	     50 MB	    4.4 MB
           35.5 MB	     50 MB	    4.6 MB
           35.7 MB	     50 MB	    4.9 MB
           36.2 MB	     50 MB	    5.1 MB
           36.4 MB	     50 MB	    5.4 MB
        ...
           49.6 MB	     50 MB	   47.6 MB
             50 MB	     50 MB	   47.9 MB
           49.6 MB	     50 MB	   48.2 MB
           49.9 MB	     50 MB	   48.4 MB
        

        A snapshot from 32 bit HotSpot:

        ...
           25.5 MB	   48.4 MB	   24.7 MB
           25.7 MB	   48.4 MB	   24.9 MB
           25.9 MB	   48.4 MB	   25.1 MB
           26.1 MB	   48.4 MB	   25.3 MB
           26.3 MB	   48.4 MB	   25.5 MB
           26.5 MB	   48.4 MB	   25.7 MB
           26.7 MB	   48.4 MB	   25.9 MB
           26.8 MB	   48.4 MB	   26.1 MB
             27 MB	   48.4 MB	   26.4 MB
           27.2 MB	   48.4 MB	   26.6 MB
           27.4 MB	   48.4 MB	   26.8 MB
           27.7 MB	   48.4 MB	     27 MB
        ...
        

        I see two problems that remain, but I don't think they're urgent enough to be addressed now:

        • the stack easily overflows if the graph of objects has long chains. This is demonstrated in the test case (uncomment ignore annotation).
        • there is a fair amount of memory allocation going on in the RUE itself. If one knows the graph of an object's dependencies is a tree then the memory cost could be decreased to zero (because we wouldn't need to remember which objects we've seen so far).
        • we could make RUE an object again (resign from static methods) and have a cache of classes and class-fields to avoid reflective accesses over and over. If one performed estimations over and over then such a RUE instance would have an initial cost, but then would be running smoother.

        Having said that, I'm +1 for committing this in if you agree with the changes I've made (I will be a pain in the arse about that naming convention discriminating between shallow vs. deep sizeOf though .

        Show
        Dawid Weiss added a comment - I've played with the code a bit and I've been trying to figure out a way to determine empirically "how far off" is the estimation from real life usage. It's not easy because RUE itself allocates memory (and not small quantities in case of complex object graphs!). I left these experiments in StressRamUsageEstimator; it is a test case – maybe we should add @Ignore and rename it to Test*, don't know. Anyway, the allocation seems to be measured pretty accurately. When tlabs are disabled this is a result of allocating small byte arrays for example: committed max estimated(allocation) 2 MB 48.4 MB 16 bytes 1.7 MB 48.4 MB 262.4 KB 2 MB 48.4 MB 524.6 KB 2.2 MB 48.4 MB 787 KB 2.5 MB 48.4 MB 1 MB 2.7 MB 48.4 MB 1.3 MB 3 MB 48.4 MB 1.5 MB 3.3 MB 48.4 MB 1.8 MB .... 46.9 MB 48.4 MB 45.6 MB 47.1 MB 48.4 MB 45.9 MB 47.4 MB 48.4 MB 46.1 MB 47.6 MB 48.4 MB 46.4 MB 47.9 MB 48.4 MB 46.6 MB 48.1 MB 48.4 MB 46.9 MB So it's fairly ideal (committed memory is all committed memory so I assume additional data structures, classes, etc. also count in). Unfortunately it's not always so smooth, for example jrockit's mx beans seem not to return the actual memory allocation state (and if they do, I don't understand it): committed max estimated(allocation) 29.4 MB 50 MB 16 bytes 29.8 MB 50 MB 262.5 KB 30.2 MB 50 MB 524.9 KB 30.4 MB 50 MB 787.3 KB 30.8 MB 50 MB 1 MB 31.1 MB 50 MB 1.3 MB 31.4 MB 50 MB 1.5 MB 31.7 MB 50 MB 1.8 MB 32 MB 50 MB 2 MB 32.4 MB 50 MB 2.3 MB 32.7 MB 50 MB 2.6 MB 33.1 MB 50 MB 2.8 MB 33.5 MB 50 MB 3.1 MB 33.8 MB 50 MB 3.3 MB 34.2 MB 50 MB 3.6 MB 34.5 MB 50 MB 3.8 MB 34.8 MB 50 MB 4.1 MB 35.2 MB 50 MB 4.4 MB 35.5 MB 50 MB 4.6 MB 35.7 MB 50 MB 4.9 MB 36.2 MB 50 MB 5.1 MB 36.4 MB 50 MB 5.4 MB ... 49.6 MB 50 MB 47.6 MB 50 MB 50 MB 47.9 MB 49.6 MB 50 MB 48.2 MB 49.9 MB 50 MB 48.4 MB A snapshot from 32 bit HotSpot: ... 25.5 MB 48.4 MB 24.7 MB 25.7 MB 48.4 MB 24.9 MB 25.9 MB 48.4 MB 25.1 MB 26.1 MB 48.4 MB 25.3 MB 26.3 MB 48.4 MB 25.5 MB 26.5 MB 48.4 MB 25.7 MB 26.7 MB 48.4 MB 25.9 MB 26.8 MB 48.4 MB 26.1 MB 27 MB 48.4 MB 26.4 MB 27.2 MB 48.4 MB 26.6 MB 27.4 MB 48.4 MB 26.8 MB 27.7 MB 48.4 MB 27 MB ... I see two problems that remain, but I don't think they're urgent enough to be addressed now: the stack easily overflows if the graph of objects has long chains. This is demonstrated in the test case (uncomment ignore annotation). there is a fair amount of memory allocation going on in the RUE itself. If one knows the graph of an object's dependencies is a tree then the memory cost could be decreased to zero (because we wouldn't need to remember which objects we've seen so far). we could make RUE an object again (resign from static methods) and have a cache of classes and class-fields to avoid reflective accesses over and over. If one performed estimations over and over then such a RUE instance would have an initial cost, but then would be running smoother. Having said that, I'm +1 for committing this in if you agree with the changes I've made (I will be a pain in the arse about that naming convention discriminating between shallow vs. deep sizeOf though .
        Hide
        Dawid Weiss added a comment -

        Oh, one thing that springs to my mind is that we could have an automatically generated class with nested static classes with a random arrangement of all sorts of fields (in all sorts of configurations) and use a similar empirical benchmark to the one I did on small byte arrays but on these objects. This would show if we're estimating object field offsets and sizes correctly.

        I wouldn't go into deep object structures though – I've tried this and it's hard to tell what the allocation is and what the overhead/ noise of measurement is.

        Show
        Dawid Weiss added a comment - Oh, one thing that springs to my mind is that we could have an automatically generated class with nested static classes with a random arrangement of all sorts of fields (in all sorts of configurations) and use a similar empirical benchmark to the one I did on small byte arrays but on these objects. This would show if we're estimating object field offsets and sizes correctly. I wouldn't go into deep object structures though – I've tried this and it's hard to tell what the allocation is and what the overhead/ noise of measurement is.
        Hide
        Uwe Schindler added a comment -

        Hi,

        I am fine with the patch for now, changed in this patch:

        • Hidden ctor
        • Cleaned up test to use static import consequently

        The stress test is an testcase, but not automatically executed (you have to explicitely do that with -Dtestcase=...). I think thats wanted, right? Otherwise we should rename, but its also noisy and slow.

        Show
        Uwe Schindler added a comment - Hi, I am fine with the patch for now, changed in this patch: Hidden ctor Cleaned up test to use static import consequently The stress test is an testcase, but not automatically executed (you have to explicitely do that with -Dtestcase=...). I think thats wanted, right? Otherwise we should rename, but its also noisy and slow.
        Hide
        Uwe Schindler added a comment -

        Final patch: I removed some code duplication and improved exception handling for the reflection while iterating class tree. Simply suppressing is a bad idea, as the resulting size would be underdetermined.

        I will commit this later this evening and then backport to 3.x.

        Show
        Uwe Schindler added a comment - Final patch: I removed some code duplication and improved exception handling for the reflection while iterating class tree. Simply suppressing is a bad idea, as the resulting size would be underdetermined. I will commit this later this evening and then backport to 3.x.
        Hide
        Shai Erera added a comment -

        Thanks Uwe !

        Show
        Shai Erera added a comment - Thanks Uwe !
        Hide
        Uwe Schindler added a comment -

        Javadocs fixes.

        Show
        Uwe Schindler added a comment - Javadocs fixes.
        Hide
        Dawid Weiss added a comment -

        Looks good to me, thanks Uwe.

        Show
        Dawid Weiss added a comment - Looks good to me, thanks Uwe.
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision: 1302133

        I will now backport with deprecations and add CHANGES.txt later!

        Show
        Uwe Schindler added a comment - Committed trunk revision: 1302133 I will now backport with deprecations and add CHANGES.txt later!
        Hide
        Uwe Schindler added a comment -

        Patch for 3.x including backwards layer (deprecated instances of RUE + string interning support). MemoryModels were nuked completely (will add comment to backwards changes).

        Show
        Uwe Schindler added a comment - Patch for 3.x including backwards layer (deprecated instances of RUE + string interning support). MemoryModels were nuked completely (will add comment to backwards changes).
        Hide
        Uwe Schindler added a comment -

        Committed 3.x revision: 1302152

        CHANGES.txt committed in revisions: 1302155 (3.x), 1302156 (trunk)

        Thanks Dawid and Shai!

        Dawid and I will now look into donating this masterpiece to maybe Apache Commons Lang or similar, as it's of general use.

        Show
        Uwe Schindler added a comment - Committed 3.x revision: 1302152 CHANGES.txt committed in revisions: 1302155 (3.x), 1302156 (trunk) Thanks Dawid and Shai! Dawid and I will now look into donating this masterpiece to maybe Apache Commons Lang or similar, as it's of general use.
        Hide
        Dawid Weiss added a comment -

        I've been experimenting a bit with the new code. Field offsets for three classes in a hierarchy with unalignable fields (byte, long combinations at all levels). Note unaligned reordering of byte field in JRockit - nice.

        JVM: [JVM: HotSpot, Sun Microsystems Inc., 1.6.0_31] (compressed OOPs)
        @12  4 Super.superByte
        @16  8 Super.subLong
        @24  8 Sub.subLong
        @32  4 Sub.subByte
        @36  4 SubSub.subSubByte
        @40  8 SubSub.subSubLong
        @48    sizeOf(SubSub.class instance)
        
        JVM: [JVM: HotSpot, Sun Microsystems Inc., 1.6.0_31] (normal OOPs)
        @16  8 Super.subLong
        @24  8 Super.superByte
        @32  8 Sub.subLong
        @40  8 Sub.subByte
        @48  8 SubSub.subSubLong
        @56  8 SubSub.subSubByte
        @64    sizeOf(SubSub.class instance)
        
        
        JVM: [JVM: J9, IBM Corporation, 1.6.0]
        @24  8 Super.subLong
        @32  4 Super.superByte
        @36  4 Sub.subByte
        @40  8 Sub.subLong
        @48  8 SubSub.subSubLong
        @56  8 SubSub.subSubByte
        @64    sizeOf(SubSub.class instance)
        
        JVM: [JVM: JRockit, Oracle Corporation, 1.6.0_26] (64-bit JVM!)
        @ 8  8 Super.subLong
        @16  1 Super.superByte
        @17  7 Sub.subByte
        @24  8 Sub.subLong
        @32  8 SubSub.subSubLong
        @40  8 SubSub.subSubByte
        @48    sizeOf(SubSub.class instance)
        
        Show
        Dawid Weiss added a comment - I've been experimenting a bit with the new code. Field offsets for three classes in a hierarchy with unalignable fields (byte, long combinations at all levels). Note unaligned reordering of byte field in JRockit - nice. JVM: [JVM: HotSpot, Sun Microsystems Inc., 1.6.0_31] (compressed OOPs) @12 4 Super.superByte @16 8 Super.subLong @24 8 Sub.subLong @32 4 Sub.subByte @36 4 SubSub.subSubByte @40 8 SubSub.subSubLong @48 sizeOf(SubSub.class instance) JVM: [JVM: HotSpot, Sun Microsystems Inc., 1.6.0_31] (normal OOPs) @16 8 Super.subLong @24 8 Super.superByte @32 8 Sub.subLong @40 8 Sub.subByte @48 8 SubSub.subSubLong @56 8 SubSub.subSubByte @64 sizeOf(SubSub.class instance) JVM: [JVM: J9, IBM Corporation, 1.6.0] @24 8 Super.subLong @32 4 Super.superByte @36 4 Sub.subByte @40 8 Sub.subLong @48 8 SubSub.subSubLong @56 8 SubSub.subSubByte @64 sizeOf(SubSub.class instance) JVM: [JVM: JRockit, Oracle Corporation, 1.6.0_26] (64-bit JVM!) @ 8 8 Super.subLong @16 1 Super.superByte @17 7 Sub.subByte @24 8 Sub.subLong @32 8 SubSub.subSubLong @40 8 SubSub.subSubByte @48 sizeOf(SubSub.class instance)
        Hide
        Uwe Schindler added a comment -

        Thanks for the insight.

        When thinking about the reordering, I am a littel bit afraid about the "optimization" in the shallow sizeOf(Class<?>). This optimiaztion does not recurse to superclasses, as it assumes, that all field offsets are greater than those of the superclass, so finding the maximum does not need to recurse up (so it early exits).

        This is generally true (also in the above printout), but not guaranteed. E.g. JRockit does it partly (it reuses space inside the superclass area to locate the byte from the subclass). In the above example still the order of fields is always Super-Sub-SubSub, but if the ordeing in the JRockit example would be like:

        @ 8  1 Super.superByte
        @ 9  7 Sub.subByte
        @16  8 Super.subLong
        @24  8 Sub.subLong
        @32  8 SubSub.subSubLong
        @40  8 SubSub.subSubByte
        @48    sizeOf(SubSub.class instance)
        

        The only thing the JVM cannot change is field offsets between sub classes (so the field offset of the superclass is inherited), but it could happen that new fields are located between super's fields (see above - it's unused space). This would also allow casting and so on (it's unused space in superclass). Unfortunately with that reordering the maximum field offset in the subclass is no longer guaranteed to be greater.

        I would suggest that we remove the "optimization" in the shallow class size method. It's too risky in my opinion to underdetermine the size, because the maximum offset in the subclass is < the maximum offset in the superclass.

        I hope my explanation was understandable...

        Dawid, what do you thing, should we remove the "optimization"? Patch is easy.

        Show
        Uwe Schindler added a comment - Thanks for the insight. When thinking about the reordering, I am a littel bit afraid about the "optimization" in the shallow sizeOf(Class<?>). This optimiaztion does not recurse to superclasses, as it assumes, that all field offsets are greater than those of the superclass, so finding the maximum does not need to recurse up (so it early exits). This is generally true (also in the above printout), but not guaranteed. E.g. JRockit does it partly (it reuses space inside the superclass area to locate the byte from the subclass). In the above example still the order of fields is always Super-Sub-SubSub, but if the ordeing in the JRockit example would be like: @ 8 1 Super.superByte @ 9 7 Sub.subByte @16 8 Super.subLong @24 8 Sub.subLong @32 8 SubSub.subSubLong @40 8 SubSub.subSubByte @48 sizeOf(SubSub.class instance) The only thing the JVM cannot change is field offsets between sub classes (so the field offset of the superclass is inherited), but it could happen that new fields are located between super's fields (see above - it's unused space). This would also allow casting and so on (it's unused space in superclass). Unfortunately with that reordering the maximum field offset in the subclass is no longer guaranteed to be greater. I would suggest that we remove the "optimization" in the shallow class size method. It's too risky in my opinion to underdetermine the size, because the maximum offset in the subclass is < the maximum offset in the superclass. I hope my explanation was understandable... Dawid, what do you thing, should we remove the "optimization"? Patch is easy.
        Hide
        Dawid Weiss added a comment -

        I hope my explanation was understandable...

        Perfectly well. Yes, I agree, it's possible to fill in the "holes" packing them with fields from subclasses. It would be a nice vm-level optimization in fact!

        I'm still experimenting on this code and cleaning/ adding javadocs – I'll patch this and provide a complete patch once I'm done, ok?

        Show
        Dawid Weiss added a comment - I hope my explanation was understandable... Perfectly well. Yes, I agree, it's possible to fill in the "holes" packing them with fields from subclasses. It would be a nice vm-level optimization in fact! I'm still experimenting on this code and cleaning/ adding javadocs – I'll patch this and provide a complete patch once I'm done, ok?
        Hide
        Uwe Schindler added a comment - - edited

        OK. All you have to remove is the if (fieldFound && useUnsafe) check and always recurse. fieldFound itsself can also be removed.

        Show
        Uwe Schindler added a comment - - edited OK. All you have to remove is the if (fieldFound && useUnsafe) check and always recurse. fieldFound itsself can also be removed.
        Hide
        Uwe Schindler added a comment -

        JRockit could even compress like this, it would still allow casting as all holes are solely used by one sub-class:

        @ 8  1 Super.superByte
        @ 9  1 Sub.subByte
        @10  6 SubSub.subSubByte
        @16  8 Super.subLong
        @24  8 Sub.subLong
        @32  8 SubSub.subSubLong
        @40    sizeOf(SubSub.class instance)
        
        Show
        Uwe Schindler added a comment - JRockit could even compress like this, it would still allow casting as all holes are solely used by one sub-class: @ 8 1 Super.superByte @ 9 1 Sub.subByte @10 6 SubSub.subSubByte @16 8 Super.subLong @24 8 Sub.subLong @32 8 SubSub.subSubLong @40 sizeOf(SubSub.class instance)
        Hide
        Dawid Weiss added a comment -

        Maybe it does such things already. I didn't check extensively.

        Show
        Dawid Weiss added a comment - Maybe it does such things already. I didn't check extensively.
        Hide
        Uwe Schindler added a comment -

        We have to remove the shallow size optimization in 3.x and trunk.

        Show
        Uwe Schindler added a comment - We have to remove the shallow size optimization in 3.x and trunk.
        Hide
        Dawid Weiss added a comment -

        I confirmed that this packing indeed takes place. Wrote a pseudo-random test with lots of classes and fields. Here's an offender on J9 for example (Wild_

        {inheritance-level}

        _

        {field-number}

        ):

        @24  4 Wild_0_92.fld_0_0_92
        @28  4 Wild_0_92.fld_1_0_92
        @32  4 Wild_0_92.fld_2_0_92
        @36  4 Wild_0_92.fld_3_0_92
        @40  4 Wild_0_92.fld_4_0_92
        @44  4 Wild_0_92.fld_5_0_92
        @48  4 Wild_0_92.fld_6_0_92
        @52  4 Wild_2_5.fld_0_2_5
        @56  8 Wild_1_85.fld_0_1_85
        @64  8 Wild_1_85.fld_1_1_85
        @72    sizeOf(Wild_2_5 instance)
        

        HotSpot and JRockit don't seem to do this (at least it didn't fail on the example).

        Show
        Dawid Weiss added a comment - I confirmed that this packing indeed takes place. Wrote a pseudo-random test with lots of classes and fields. Here's an offender on J9 for example (Wild_ {inheritance-level} _ {field-number} ): @24 4 Wild_0_92.fld_0_0_92 @28 4 Wild_0_92.fld_1_0_92 @32 4 Wild_0_92.fld_2_0_92 @36 4 Wild_0_92.fld_3_0_92 @40 4 Wild_0_92.fld_4_0_92 @44 4 Wild_0_92.fld_5_0_92 @48 4 Wild_0_92.fld_6_0_92 @52 4 Wild_2_5.fld_0_2_5 @56 8 Wild_1_85.fld_0_1_85 @64 8 Wild_1_85.fld_1_1_85 @72 sizeOf(Wild_2_5 instance) HotSpot and JRockit don't seem to do this (at least it didn't fail on the example).
        Hide
        Uwe Schindler added a comment -

        Thanks, in that case shallowSizeOf(Wild_2_5.class) would incorrectly return 56 because of the short-circuit - so let's fix this.

        Show
        Uwe Schindler added a comment - Thanks, in that case shallowSizeOf(Wild_2_5.class) would incorrectly return 56 because of the short-circuit - so let's fix this.
        Hide
        Dawid Weiss added a comment -

        Yep, that assumption was wrong – indeed:

        WildClasses.Wild_2_5 wc = new WildClasses.Wild_2_5();
        wc.fld_6_0_92 = 0x1122;
        wc.fld_0_2_5 = Float.intBitsToFloat(0xa1a2a3a4);
        wc.fld_0_1_85 = Double.longBitsToDouble(0xb1b2b3b4b5b6b7L);
        System.out.println(ExpMemoryDumper.dumpObjectMem(wc));
        

        results in:

        0x0000 b0 3d 6f 01 00 00 00 00 0e 80 79 01 00 00 00 00
        0x0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        0x0030 22 11 00 00 a4 a3 a2 a1 b7 b6 b5 b4 b3 b2 b1 00
        0x0040 00 00 00 00 00 00 00 00
        

        And you can see they are reordered and longs are aligned.

        I'll provide a cumulative patch of changes in the evening, there's one more thing I wanted to add (cache of fields) because this affects processing speed.

        Show
        Dawid Weiss added a comment - Yep, that assumption was wrong – indeed: WildClasses.Wild_2_5 wc = new WildClasses.Wild_2_5(); wc.fld_6_0_92 = 0x1122; wc.fld_0_2_5 = Float.intBitsToFloat(0xa1a2a3a4); wc.fld_0_1_85 = Double.longBitsToDouble(0xb1b2b3b4b5b6b7L); System.out.println(ExpMemoryDumper.dumpObjectMem(wc)); results in: 0x0000 b0 3d 6f 01 00 00 00 00 0e 80 79 01 00 00 00 00 0x0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0030 22 11 00 00 a4 a3 a2 a1 b7 b6 b5 b4 b3 b2 b1 00 0x0040 00 00 00 00 00 00 00 00 And you can see they are reordered and longs are aligned. I'll provide a cumulative patch of changes in the evening, there's one more thing I wanted to add (cache of fields) because this affects processing speed.
        Hide
        Dawid Weiss added a comment -

        Ok, I admit J9 is fascinating... How much memory does this take?

        class X {
          byte a = 0x11;
          byte b = 0x22;
        }
        

        Here is the memory layout:

        [JVM: IBM J9 VM, 2.6, IBM Corporation, IBM Corporation, 1.7.0]
        0x0000 00 b8 21 c4 5f 7f 00 00 00 00 00 00 00 00 00 00
        0x0010 11 00 00 00 22 00 00 00
        @16  4 Super.b1
        @20  4 Super.b2
        @24    sizeOf(Super instance)
        

        I don't think I screwed up anything. It really is 4 byte alignment on all fields.

        Show
        Dawid Weiss added a comment - Ok, I admit J9 is fascinating... How much memory does this take? class X { byte a = 0x11; byte b = 0x22; } Here is the memory layout: [JVM: IBM J9 VM, 2.6, IBM Corporation, IBM Corporation, 1.7.0] 0x0000 00 b8 21 c4 5f 7f 00 00 00 00 00 00 00 00 00 00 0x0010 11 00 00 00 22 00 00 00 @16 4 Super.b1 @20 4 Super.b2 @24 sizeOf(Super instance) I don't think I screwed up anything. It really is 4 byte alignment on all fields .
        Hide
        Dawid Weiss added a comment -

        Don't be scared by the size of this patch – it contains a lot of generated code in WildClasses.

        Improvements:

        • size estimation is not recursive (which led to stack overflows quite easily on more complex object graphs),
        • decreased memory consumption by using a custom impl. of an identity object set.
        • added a cache of resolved class information (ref. fields, shallow size).
        • removed the optimization of counting only subclass field offsets because fields can be packed (J9).
        • added more verbose information about unsupported JVM features. J9 doesn't have the MX bean for example (and does dump this).

        The above changes also speed up the entire processing.

        Show
        Dawid Weiss added a comment - Don't be scared by the size of this patch – it contains a lot of generated code in WildClasses. Improvements: size estimation is not recursive (which led to stack overflows quite easily on more complex object graphs), decreased memory consumption by using a custom impl. of an identity object set. added a cache of resolved class information (ref. fields, shallow size). removed the optimization of counting only subclass field offsets because fields can be packed (J9). added more verbose information about unsupported JVM features. J9 doesn't have the MX bean for example (and does dump this). The above changes also speed up the entire processing.
        Hide
        Dawid Weiss added a comment -

        Added a test case for identity has set, removed constants, removed wild classes.

        Show
        Dawid Weiss added a comment - Added a test case for identity has set, removed constants, removed wild classes.
        Hide
        Uwe Schindler added a comment -

        I think the patch is now fine! I will commit it later and backport to 3.x.

        Show
        Uwe Schindler added a comment - I think the patch is now fine! I will commit it later and backport to 3.x.
        Hide
        Dawid Weiss added a comment -

        Thanks Uwe. I'll be working in the evening again but if you're faster go ahead and commit it in.

        Show
        Dawid Weiss added a comment - Thanks Uwe. I'll be working in the evening again but if you're faster go ahead and commit it in.
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision: 1304485, 1304513, 1304564
        Committed 3.x revision: 1304565

        Show
        Uwe Schindler added a comment - Committed trunk revision: 1304485, 1304513, 1304564 Committed 3.x revision: 1304565
        Hide
        Dawid Weiss added a comment - - edited

        I've been thinking how one can assess the estimation quality of the new code. I came up with this:

        • I allocate an Object[] half the size of estimated maximum available RAM (just to make sure all objects will fit without the need to reallocate),
        • I precompute shallow sizes for instances of all "wild classes" (classes with random fields, including arrays).
        • I then fill in the "vault" array above with random instances of wild classes, summing up the estimated size UNTIL I HIT OOM.
        • Once I git OOM I know how much we actually allocated vs. how much space we thought we did allocate.

        The results are very accurate on HotSpot if one is using serial GC. For example:

        [JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun Microsystems Inc., 1.6.0_29]
        Max: 483.4 MB, Used: 698.9 KB, Committed: 123.8 MB
        Expected free: 240.9 MB, Allocated estimation: 240.8 MB, Difference: -0.05% (113.6 KB)
        

        If one runs with a parallel GC things do get out of hand because the GC is not keeping up with allocations (although I'm not sure how I should interpret this because we only allocate; it's not possible to free any space – maybe there are different GC pools or something):

        [JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun Microsystems Inc., 1.6.0_29]
        Max: 444.5 MB, Used: 655.4 KB, Committed: 122.7 MB
        Expected free: 221.5 MB, Allocated estimation: 174.2 MB, Difference: -21.34% (47.3 MB)
        

        JRockit:

        [JVM: Oracle JRockit(R), R28.1.4-7-144370-1.6.0_26-20110617-2130-windows-x86_64, Oracle Corporation, Oracle Corporation, 1.6.0_26]
        Max: 500 MB, Used: 3.5 MB, Committed: 64 MB
        Expected free: 247.7 MB, Allocated estimation: 249.5 MB, Difference: 0.74% (1.8 MB)
        

        I think we're good. If somebody wishes to experiment, the spike is here:
        https://github.com/dweiss/java-sizeof

        mvn test
        mvn dependency:copy-dependencies
        java -cp target\classes:target\test-classes:target\dependency\junit-4.10.jar \
          com.carrotsearch.sizeof.TestEstimationQuality
        
        Show
        Dawid Weiss added a comment - - edited I've been thinking how one can assess the estimation quality of the new code. I came up with this: I allocate an Object[] half the size of estimated maximum available RAM (just to make sure all objects will fit without the need to reallocate), I precompute shallow sizes for instances of all "wild classes" (classes with random fields, including arrays). I then fill in the "vault" array above with random instances of wild classes, summing up the estimated size UNTIL I HIT OOM. Once I git OOM I know how much we actually allocated vs. how much space we thought we did allocate. The results are very accurate on HotSpot if one is using serial GC. For example: [JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun Microsystems Inc., 1.6.0_29] Max: 483.4 MB, Used: 698.9 KB, Committed: 123.8 MB Expected free: 240.9 MB, Allocated estimation: 240.8 MB, Difference: -0.05% (113.6 KB) If one runs with a parallel GC things do get out of hand because the GC is not keeping up with allocations (although I'm not sure how I should interpret this because we only allocate; it's not possible to free any space – maybe there are different GC pools or something): [JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun Microsystems Inc., 1.6.0_29] Max: 444.5 MB, Used: 655.4 KB, Committed: 122.7 MB Expected free: 221.5 MB, Allocated estimation: 174.2 MB, Difference: -21.34% (47.3 MB) JRockit: [JVM: Oracle JRockit(R), R28.1.4-7-144370-1.6.0_26-20110617-2130-windows-x86_64, Oracle Corporation, Oracle Corporation, 1.6.0_26] Max: 500 MB, Used: 3.5 MB, Committed: 64 MB Expected free: 247.7 MB, Allocated estimation: 249.5 MB, Difference: 0.74% (1.8 MB) I think we're good. If somebody wishes to experiment, the spike is here: https://github.com/dweiss/java-sizeof mvn test mvn dependency:copy-dependencies java -cp target\classes:target\test-classes:target\dependency\junit-4.10.jar \ com.carrotsearch.sizeof.TestEstimationQuality
        Hide
        Dawid Weiss added a comment -

        For historical records: the previous implementation of RamUsageEstimator was off by anything between 3% (random size objects, including arrays) to 20% (objects smaller than 80 bytes). Again – these are "perfect scenario" measurements with empty heap and max. allocation until OOM, with a serial GC. With a concurrent and parallel GCs the memory consumption estimation is still accurate but it's nearly impossible to tell when an OOM will occur or how the GC will manage the heap space.

        Show
        Dawid Weiss added a comment - For historical records: the previous implementation of RamUsageEstimator was off by anything between 3% (random size objects, including arrays) to 20% (objects smaller than 80 bytes). Again – these are "perfect scenario" measurements with empty heap and max. allocation until OOM, with a serial GC. With a concurrent and parallel GCs the memory consumption estimation is still accurate but it's nearly impossible to tell when an OOM will occur or how the GC will manage the heap space.
        Hide
        Uwe Schindler added a comment -

        That's true. But you can still get the "unreleaseable allocation", so the size of the non-gc-able object graph. If GC does not free the objects after release fast-enough, it will still do it once memory gets low. But the allocated objects with hard refs are not releaseable.

        So I think it's fine for memory requirement purposes. If you want real heap allocation, you must use instrumentation.

        Show
        Uwe Schindler added a comment - That's true. But you can still get the "unreleaseable allocation", so the size of the non-gc-able object graph. If GC does not free the objects after release fast-enough, it will still do it once memory gets low. But the allocated objects with hard refs are not releaseable. So I think it's fine for memory requirement purposes. If you want real heap allocation, you must use instrumentation.
        Hide
        Dawid Weiss added a comment -

        I didn't say it's wrong – it is fine and accurate. What I'm saying is that it's not really suitable for predictions; for answering questions like: how many objects of a given type/ types can I allocate before an OOM hits me? It doesn't really surprise me that much, but it would be nice. For measuring already allocated stuff it's more than fine of course.

        Show
        Dawid Weiss added a comment - I didn't say it's wrong – it is fine and accurate. What I'm saying is that it's not really suitable for predictions; for answering questions like: how many objects of a given type/ types can I allocate before an OOM hits me? It doesn't really surprise me that much, but it would be nice. For measuring already allocated stuff it's more than fine of course.

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development