Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7966

build mr-jar and use some java 9 methods if available

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/other, general/build
    • Labels:
    • Lucene Fields:
      New

      Description

      See background: http://openjdk.java.net/jeps/238

      It would be nice to use some of the newer array methods and range checking methods in java 9 for example, without waiting for lucene 10 or something. If we build an MR-jar, we can start migrating our code to use java 9 methods right now, it will use optimized methods from java 9 when thats available, otherwise fall back to java 8 code.

      This patch adds:

      Objects.checkIndex(int,int)
      Objects.checkFromToIndex(int,int,int)
      Objects.checkFromIndexSize(int,int,int)
      Arrays.mismatch(byte[],int,int,byte[],int,int)
      Arrays.compareUnsigned(byte[],int,int,byte[],int,int)
      Arrays.equal(byte[],int,int,byte[],int,int)
      // did not add char/int/long/short/etc but of course its possible if needed
      

      It sets these up in org.apache.lucene.future as 1-1 mappings to java methods. This way, we can simply directly replace call sites with java 9 methods when java 9 is a minimum. Simple 1-1 mappings mean also that we only have to worry about testing that our java 8 fallback methods work.

      I found that many of the current byte array methods today are willy-nilly and very lenient for example, passing invalid offsets at times and relying on compare methods not throwing exceptions, etc. I fixed all the instances in core/codecs but have not looked at the problems with AnalyzingSuggester. Also SimpleText still uses a silly method in ArrayUtil in similar crazy way, have not removed that one yet.

      1. LUCENE-7966.patch
        100 kB
        Robert Muir
      2. LUCENE-7966.patch
        99 kB
        Robert Muir
      3. LUCENE-7966.patch
        68 kB
        Robert Muir
      4. LUCENE-7966.patch
        64 kB
        Robert Muir
      5. LUCENE-7966.patch
        64 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          rcmuir Robert Muir added a comment -

          Here's my current patch. Need to look into the AnalyzingSuggester abuse of comparator, but other than that I think its ok as a start. See the TODO's in lucene/core/build.xml related to where to go from here.

          As far as code I did some cleanup and plugged new stuff in some obvious places, but there is a lot more that could be done.

          Not sure if I have the time to see it through, i didnt realize how much of our byte[] stuff alone was a mess...

          Show
          rcmuir Robert Muir added a comment - Here's my current patch. Need to look into the AnalyzingSuggester abuse of comparator, but other than that I think its ok as a start. See the TODO's in lucene/core/build.xml related to where to go from here. As far as code I did some cleanup and plugged new stuff in some obvious places, but there is a lot more that could be done. Not sure if I have the time to see it through, i didnt realize how much of our byte[] stuff alone was a mess...
          Hide
          rcmuir Robert Muir added a comment -

          Updated patch. I fixed the fallout better (don't encode wasteful bytes for the first term, no bytes need to be written) from detecting out-of-order comparisons in prefix-coding methods, this annoyingly was always related to the empty string: happened with bytesDifference([], [])/sortKeyLength([], []).

          To me it makes sense to detect the malfunction (-1 return from mismatch) rather than just leniently doing something strange for duplicate terms, it may detect bugs.

          Show
          rcmuir Robert Muir added a comment - Updated patch. I fixed the fallout better (don't encode wasteful bytes for the first term, no bytes need to be written) from detecting out-of-order comparisons in prefix-coding methods, this annoyingly was always related to the empty string: happened with bytesDifference([], [])/sortKeyLength([], []). To me it makes sense to detect the malfunction (-1 return from mismatch) rather than just leniently doing something strange for duplicate terms, it may detect bugs.
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi Robert,
          this is wonderful. I am happy that our discussion about this helped. Nice that you used the example on the ANT webpage: https://ant.apache.org/manual/Tasks/jar.html#jep238-example

          The problem is that you currently need Java 9 to compile, too. I'd make it optional. My alternative idea would be: Make a stub clone of the Objects/Arrays classes without code from the JDK and use them as bootclasspath. I can try to work with that.

          Show
          thetaphi Uwe Schindler added a comment - Hi Robert, this is wonderful. I am happy that our discussion about this helped. Nice that you used the example on the ANT webpage: https://ant.apache.org/manual/Tasks/jar.html#jep238-example The problem is that you currently need Java 9 to compile, too. I'd make it optional. My alternative idea would be: Make a stub clone of the Objects/Arrays classes without code from the JDK and use them as bootclasspath. I can try to work with that.
          Hide
          dweiss Dawid Weiss added a comment -

          + <!-- TODO: on windows does executable need .exe suffix? -->

          Not if you fork=true because then the task is executed via cmd which resolves file extensions automatically.

          Show
          dweiss Dawid Weiss added a comment - + <!-- TODO: on windows does executable need .exe suffix? --> Not if you fork=true because then the task is executed via cmd which resolves file extensions automatically.
          Hide
          dweiss Dawid Weiss added a comment -

          Overall, pretty neat and an uncharted territory (I didn't see anybody using this JEP yet!). One thing that we lose here is the verbosity of exception message (field names, doc ids); for example:

          -    if (dimension < 0 || dimension >= type.pointDimensionCount()/2) {
          -      throw new IllegalArgumentException("dimension request (" + dimension +
          -          ") out of bounds for field (name=" + name + " dimensions=" + type.pointDimensionCount()/2 + "). ");
          -    }
          +    FutureObjects.checkIndex(dimension, type.pointDimensionCount()/2);
          

          Bit since these are exceptional cases I don't think they'd be particularly useful for diagnostics unless you can repeat the problem (and then you can add/debug what you like).

          Show
          dweiss Dawid Weiss added a comment - Overall, pretty neat and an uncharted territory (I didn't see anybody using this JEP yet!). One thing that we lose here is the verbosity of exception message (field names, doc ids); for example: - if (dimension < 0 || dimension >= type.pointDimensionCount()/2) { - throw new IllegalArgumentException( "dimension request (" + dimension + - ") out of bounds for field (name=" + name + " dimensions=" + type.pointDimensionCount()/2 + "). " ); - } + FutureObjects.checkIndex(dimension, type.pointDimensionCount()/2); Bit since these are exceptional cases I don't think they'd be particularly useful for diagnostics unless you can repeat the problem (and then you can add/debug what you like).
          Hide
          thetaphi Uwe Schindler added a comment -

          One thing that we lose here is the verbosity of exception message (field names, doc ids)

          Yes, but there is a reason to change to the new methods! Robert did not add that to the issue description. I directed him on the weekend during a private chat to the new methods (which was caused by my comment in a former issue with the missing check).

          To make it short: Those methods are highly optimized by Hotspot. In fact they replaced the bounds checks in ByteBuffers and other places throughout the JDK to call those methods instead of hardcoded if/then/else statements. I also talked with Rémi Forax about it this summer to get more information. He strongly recommends to use the new methods from Objects and Arrays instead of manually crafted if/then/else checks. The reason is: Those methods are intrinsics and are more or less replaced by highly optimized bounds check code. In addition, if you add several levels of bounds checks more magic is happening: a high level method checks bounds and call lower level method, which does bounds checks again (e.g., a ByteBuffer in the JDK), and this lower level method again accesses a Java array (that implcitely does bounds checks, too), - then Hotspot adds all calls of those Objects' index check methods to the internal AST and it can then safely eliminate all of them except one. And if it sees that a variable is unsigned it will also remove negative checks ... and so on. This was done, because they had very hard problems in eliminating those checks everywhere when somebody creates an if statement (there are tons of ways to do the same check with if statements!), the OpenJDK developers argued to use a intrinsic replacement method instead of hardcoded bounds checks with hundreds of variants. By that they only have to optimize a single "if statement" and can replace it by assembly code and remove duplicates.

          Even shorter: If you use the Objects methods instead of if statements, you can add more safety to your code, as duplicate checks are eliminated. So we can even start to add checks into BytesRef. Or we can remove the stupid try/catch blocks in ByteBufferIndexInput.

          Show
          thetaphi Uwe Schindler added a comment - One thing that we lose here is the verbosity of exception message (field names, doc ids) Yes, but there is a reason to change to the new methods! Robert did not add that to the issue description. I directed him on the weekend during a private chat to the new methods (which was caused by my comment in a former issue with the missing check). To make it short: Those methods are highly optimized by Hotspot. In fact they replaced the bounds checks in ByteBuffers and other places throughout the JDK to call those methods instead of hardcoded if/then/else statements. I also talked with Rémi Forax about it this summer to get more information. He strongly recommends to use the new methods from Objects and Arrays instead of manually crafted if/then/else checks. The reason is: Those methods are intrinsics and are more or less replaced by highly optimized bounds check code. In addition, if you add several levels of bounds checks more magic is happening: a high level method checks bounds and call lower level method, which does bounds checks again (e.g., a ByteBuffer in the JDK), and this lower level method again accesses a Java array (that implcitely does bounds checks, too), - then Hotspot adds all calls of those Objects' index check methods to the internal AST and it can then safely eliminate all of them except one. And if it sees that a variable is unsigned it will also remove negative checks ... and so on. This was done, because they had very hard problems in eliminating those checks everywhere when somebody creates an if statement (there are tons of ways to do the same check with if statements!), the OpenJDK developers argued to use a intrinsic replacement method instead of hardcoded bounds checks with hundreds of variants. By that they only have to optimize a single "if statement" and can replace it by assembly code and remove duplicates. Even shorter: If you use the Objects methods instead of if statements, you can add more safety to your code, as duplicate checks are eliminated. So we can even start to add checks into BytesRef. Or we can remove the stupid try/catch blocks in ByteBufferIndexInput.
          Hide
          dweiss Dawid Weiss added a comment -

          Uwe: I understand the reason for this patch. My point was just that you do lose some verbosity in the exception message, that's all.

          As for performance I wonder what the actual gain will be - I bet you a beer the performance will be within the noise levels on larger tasks.

          Show
          dweiss Dawid Weiss added a comment - Uwe: I understand the reason for this patch. My point was just that you do lose some verbosity in the exception message, that's all. As for performance I wonder what the actual gain will be - I bet you a beer the performance will be within the noise levels on larger tasks.
          Hide
          thetaphi Uwe Schindler added a comment -

          As for performance I wonder what the actual gain will be - I bet you a beer the performance will be within the noise levels on larger tasks.

          Of course, we have to test this how much it gains! We should compare:

          • Java 8 without and with patch
          • Java 9 without and with patch

          But the main reason is to make our code "safer". We omitted bounds checks at many places, because the overhead was immense (e.g. BytesRef, ByteBufferIndexInput,...) and sometimes not even added asserts. If we make that code safer, it will not be slower in Java 9. So we can say: We still work with Java 8, but you should really use Java 9 for optimal performance.

          Robert just started with the important stuff, but there is room for improvements.

          Show
          thetaphi Uwe Schindler added a comment - As for performance I wonder what the actual gain will be - I bet you a beer the performance will be within the noise levels on larger tasks. Of course, we have to test this how much it gains! We should compare: Java 8 without and with patch Java 9 without and with patch But the main reason is to make our code "safer". We omitted bounds checks at many places, because the overhead was immense (e.g. BytesRef, ByteBufferIndexInput,...) and sometimes not even added asserts. If we make that code safer, it will not be slower in Java 9. So we can say: We still work with Java 8, but you should really use Java 9 for optimal performance. Robert just started with the important stuff, but there is room for improvements.
          Hide
          dweiss Dawid Weiss added a comment -

          I wholeheartedly agree – it's fascinating stuff and thanks to Robert and you for starting this.

          We should compare:

          Those bound checks will cost something – the fact they're intrinsic methods doesn't mean they're free. There's another dimension to your list: the CPU architecture hotspot generates code for... I didn't look at the code, but those intrinsics will vary depending on what the CPU has to offer. So regardless of easier ideal graph optimizations (which is great!) there will be some assembly injected to make those bound checks work, especially on CPUs with less complex instruction set.

          Like I said: very interesting stuff to work on and benchmark!

          Show
          dweiss Dawid Weiss added a comment - I wholeheartedly agree – it's fascinating stuff and thanks to Robert and you for starting this. We should compare: Those bound checks will cost something – the fact they're intrinsic methods doesn't mean they're free. There's another dimension to your list: the CPU architecture hotspot generates code for... I didn't look at the code, but those intrinsics will vary depending on what the CPU has to offer. So regardless of easier ideal graph optimizations (which is great!) there will be some assembly injected to make those bound checks work, especially on CPUs with less complex instruction set. Like I said: very interesting stuff to work on and benchmark!
          Hide
          rcmuir Robert Muir added a comment -

          Dawid: i didn't really add new checks in the patch: I converted a bunch of existing checks to the new Objects.checkXXX methods. I think any difference in exception messages is not a big deal, actually i like the consistency and like the newer exception messages (see the unit tests for those: i emulated what java 9 returns in the java 8 methods).

          The only place where there are new "checks" is where we use Arrays.compare/equals/mismatch. Those methods are defined in the jdk as safe, of course, and as I mentioned here, the existing comparator/etc methods we have are not very safe. See especially the prefix-coding corner cases I ran into here
          For those methods, the hope is that its still net/net drowned out: and hopefully definitely compensated by the fact that these methods no longer go one-byte-at-a-time.

          In all cases none of this stuff should really be a hotspot for lucene. IMO we should be encouraging more safety/checks/reliability, and thats the goal here.

          Show
          rcmuir Robert Muir added a comment - Dawid: i didn't really add new checks in the patch: I converted a bunch of existing checks to the new Objects.checkXXX methods. I think any difference in exception messages is not a big deal, actually i like the consistency and like the newer exception messages (see the unit tests for those: i emulated what java 9 returns in the java 8 methods). The only place where there are new "checks" is where we use Arrays.compare/equals/mismatch. Those methods are defined in the jdk as safe, of course, and as I mentioned here, the existing comparator/etc methods we have are not very safe. See especially the prefix-coding corner cases I ran into here For those methods, the hope is that its still net/net drowned out: and hopefully definitely compensated by the fact that these methods no longer go one-byte-at-a-time. In all cases none of this stuff should really be a hotspot for lucene. IMO we should be encouraging more safety/checks/reliability, and thats the goal here.
          Hide
          rcmuir Robert Muir added a comment -

          There's another dimension to your list: the CPU architecture hotspot generates code for... I didn't look at the code, but those intrinsics will vary depending on what the CPU has to offer.

          No there isn't actually, the transformation is platform independent:

          http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b756e7a2ec33/src/share/vm/opto/library_call.cpp#l1160

          Please in the future, lets actually look at the code first, before getting too paranoid about such things. Otherwise we make decisions based on FUD instead of facts and end out with the sloppy array code like we have today

          Show
          rcmuir Robert Muir added a comment - There's another dimension to your list: the CPU architecture hotspot generates code for... I didn't look at the code, but those intrinsics will vary depending on what the CPU has to offer. No there isn't actually, the transformation is platform independent: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b756e7a2ec33/src/share/vm/opto/library_call.cpp#l1160 Please in the future, lets actually look at the code first, before getting too paranoid about such things. Otherwise we make decisions based on FUD instead of facts and end out with the sloppy array code like we have today
          Hide
          dweiss Dawid Weiss added a comment -

          I thought it's an intrinsic reaching all the way down to an inlined assembly (macro), not just ideal graph transformation/ optimization. Apologies for not verifying this.

          Show
          dweiss Dawid Weiss added a comment - I thought it's an intrinsic reaching all the way down to an inlined assembly (macro), not just ideal graph transformation/ optimization. Apologies for not verifying this.
          Hide
          rcmuir Robert Muir added a comment -

          I opened LUCENE-7968 for the suggester case failing here, this looks seriously buggy since it has a negative array length, just hidden by the fact BytesRef.compareTo silently returns false today. We should fix it separately.

          Show
          rcmuir Robert Muir added a comment - I opened LUCENE-7968 for the suggester case failing here, this looks seriously buggy since it has a negative array length, just hidden by the fact BytesRef.compareTo silently returns false today. We should fix it separately.
          Hide
          rcmuir Robert Muir added a comment -

          Same patch, just folding in the suggester bugfix (LUCENE-7968) so that all tests pass. I think this is good reason to look at CharsRef/IntsRef/LongsRef comparators and so on, maybe we can find more bugs.

          Show
          rcmuir Robert Muir added a comment - Same patch, just folding in the suggester bugfix ( LUCENE-7968 ) so that all tests pass. I think this is good reason to look at CharsRef/IntsRef/LongsRef comparators and so on, maybe we can find more bugs.
          Hide
          rcmuir Robert Muir added a comment -

          I updated the patch with more cleanups: nuked the older ArrayUtil methods, cleaned up PrefixCodedTerms, cutover CharsRef,IntsRef,LongsRef.

          So this adds compare()/equal() for char[]/int[]/long[]. It also adds mismatch() for char[] to implement the UTF16-in-UTF8-order comparator.

          Didnt find any new bugs, so seems like more than enough for now. I will think about how we can do some validation of MRJAR consistency in smoketester/build/something: that's really needed or we can't ensure stuff is correct. And also I will think about Uwe's idea about the bootclasspath hack so maybe folks don't need an actual java9 compiler.

          Show
          rcmuir Robert Muir added a comment - I updated the patch with more cleanups: nuked the older ArrayUtil methods, cleaned up PrefixCodedTerms, cutover CharsRef,IntsRef,LongsRef. So this adds compare()/equal() for char[]/int[]/long[]. It also adds mismatch() for char[] to implement the UTF16-in-UTF8-order comparator. Didnt find any new bugs, so seems like more than enough for now. I will think about how we can do some validation of MRJAR consistency in smoketester/build/something: that's really needed or we can't ensure stuff is correct. And also I will think about Uwe's idea about the bootclasspath hack so maybe folks don't need an actual java9 compiler.
          Hide
          rcmuir Robert Muir added a comment -

          Updated patch implementing LZ4.commonBytes with mismatch.

          Show
          rcmuir Robert Muir added a comment - Updated patch implementing LZ4.commonBytes with mismatch.
          Hide
          rcmuir Robert Muir added a comment -

          Adrien Grand Maybe you can experiment with the LZ4 change if you get bored, since it was your idea

          Show
          rcmuir Robert Muir added a comment - Adrien Grand Maybe you can experiment with the LZ4 change if you get bored, since it was your idea
          Hide
          rcmuir Robert Muir added a comment -

          Adrien also mentioned the change to BytesRef.compareTo might impact building the OrdinalMap (LUCENE-7905), that would be an interesting thing to try to bench as well.

          Show
          rcmuir Robert Muir added a comment - Adrien also mentioned the change to BytesRef.compareTo might impact building the OrdinalMap ( LUCENE-7905 ), that would be an interesting thing to try to bench as well.
          Hide
          jpountz Adrien Grand added a comment -

          I did some tests with the Calgary corpus that can be found at http://corpus.canterbury.ac.nz/descriptions/ (lower is better):

          File Time to compress without patch Time to compress with the patch Difference
          bib 971702 904173 -6.9%
          book1 7479794 7073712 -5.4%
          book2 4990347 4574486 -8.3%
          geo 1600972 1574435 -1.7%
          news 3394833 3222113 -5.1%
          obj1 169516 166673 -1.7%
          obj2 1869442 1769302 -5.4%
          paper1 385900 357472 -7.4%
          pic 1528354 1314336 -14%
          progc 279295 261445 -6.4%
          progl 410565 376898 -8.2%
          progp 245654 222230 -9.5%
          trans 517571 470134 -9.2%

          As expected the improvement is better on files that have long repetitions like source code and the bitmap picture. The speedup is constantly reproducible.

          Show
          jpountz Adrien Grand added a comment - I did some tests with the Calgary corpus that can be found at http://corpus.canterbury.ac.nz/descriptions/ (lower is better): File Time to compress without patch Time to compress with the patch Difference bib 971702 904173 -6.9% book1 7479794 7073712 -5.4% book2 4990347 4574486 -8.3% geo 1600972 1574435 -1.7% news 3394833 3222113 -5.1% obj1 169516 166673 -1.7% obj2 1869442 1769302 -5.4% paper1 385900 357472 -7.4% pic 1528354 1314336 -14% progc 279295 261445 -6.4% progl 410565 376898 -8.2% progp 245654 222230 -9.5% trans 517571 470134 -9.2% As expected the improvement is better on files that have long repetitions like source code and the bitmap picture. The speedup is constantly reproducible.
          Hide
          rcmuir Robert Muir added a comment -

          Adrien, thanks for benchmarking!

          Show
          rcmuir Robert Muir added a comment - Adrien, thanks for benchmarking!
          Hide
          mikemccand Michael McCandless added a comment -

          I tested performance on a large OrdinalMap:

          java 8 no patch:
            done init top-level VE state; took 149.411s; 1062.73 MB RAM; 94244084 total unique families
            done init top-level VE state; took 148.977s; 1062.73 MB RAM; 94244084 total unique families
          
          java 8 patch:
            done init top-level VE state; took 150.789s; 1062.73 MB RAM; 94244084 total unique families
            done init top-level VE state; took 150.509s; 1062.73 MB RAM; 94244084 total unique families
          
          java 9 no patch:
            done init top-level VE state; took 151.309s; 1062.73 MB RAM; 94244084 total unique families
            done init top-level VE state; took 149.814s; 1062.73 MB RAM; 94244084 total unique families
          
          java 9 patch:
            done init top-level VE state; took 158.535s; 1062.73 MB RAM; 94244084 total unique families
            done init top-level VE state; took 156.207s; 1062.73 MB RAM; 94244084 total unique families
          

          I'm running on 7x and the patch had one non-test conflict which was simple to resolve.

          I'm not sure what's going on; does the patch even change code related to OrdinalMap? (I haven't looked closely).

          Show
          mikemccand Michael McCandless added a comment - I tested performance on a large OrdinalMap : java 8 no patch: done init top-level VE state; took 149.411s; 1062.73 MB RAM; 94244084 total unique families done init top-level VE state; took 148.977s; 1062.73 MB RAM; 94244084 total unique families java 8 patch: done init top-level VE state; took 150.789s; 1062.73 MB RAM; 94244084 total unique families done init top-level VE state; took 150.509s; 1062.73 MB RAM; 94244084 total unique families java 9 no patch: done init top-level VE state; took 151.309s; 1062.73 MB RAM; 94244084 total unique families done init top-level VE state; took 149.814s; 1062.73 MB RAM; 94244084 total unique families java 9 patch: done init top-level VE state; took 158.535s; 1062.73 MB RAM; 94244084 total unique families done init top-level VE state; took 156.207s; 1062.73 MB RAM; 94244084 total unique families I'm running on 7x and the patch had one non-test conflict which was simple to resolve. I'm not sure what's going on; does the patch even change code related to OrdinalMap ? (I haven't looked closely).
          Hide
          rcmuir Robert Muir added a comment -

          does the patch even change code related to OrdinalMap?

          it changes BytesRef.compareTo, which normally isnt that interesting, but its the comparison function used when constructing the ordinal map. Previous comments on LUCENE-7905 suggested that this might be a bottleneck there.

          Show
          rcmuir Robert Muir added a comment - does the patch even change code related to OrdinalMap? it changes BytesRef.compareTo , which normally isnt that interesting, but its the comparison function used when constructing the ordinal map. Previous comments on LUCENE-7905 suggested that this might be a bottleneck there.
          Hide
          rcmuir Robert Muir added a comment -

          Mike, is it possible the benchmark didn't warm up here (or maybe something happened with the 7.x backport?). The results are a bit noisy and don't make sense.

          When I modify the JMH bench here https://richardstartin.com/2017/07/16/new-methods-in-java-9-math-fma-and-arrays-mismatch/ to use shorter strings right above the threshold where the intrinsic kicks in (8, 9, 10), i don't see any regressions. So I don't think it should ever get slower, regardless ofany wierdness about your data: instead maybe just how the bench was done?

          Show
          rcmuir Robert Muir added a comment - Mike, is it possible the benchmark didn't warm up here (or maybe something happened with the 7.x backport?). The results are a bit noisy and don't make sense. When I modify the JMH bench here https://richardstartin.com/2017/07/16/new-methods-in-java-9-math-fma-and-arrays-mismatch/ to use shorter strings right above the threshold where the intrinsic kicks in (8, 9, 10), i don't see any regressions. So I don't think it should ever get slower, regardless ofany wierdness about your data: instead maybe just how the bench was done?
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi,
          I took Robert's latest patch, committed it to my Github fork and developed it a bit more: I added a stub generator (that can only be used with Java 9) to generate stub classes that only have the Java 9 signatures, but no private stuff or method bodies. This is done by a groovy script, that can be executed (requires Java 9 as JAVA_HOME): ant generate-java9-stubs. The files generated by this are committed, so nobody needs java 9 to compile lucene. The license should be no problem, as no code is involved. This is mentioned in the README.txt in the folder. The stub classes are compiled with class version of Java 8.

          When compiling the MR-JAR, the standard Java compiler is used, but we prepend our stub classes to the bootclasspath of javac. This allows us to compile the MR-JAR file also with Java 8. If Java 9 is detected, we don't add anything to bootclasspath.

          The branch is here: https://github.com/apache/lucene-solr/compare/master...uschindler:jira/LUCENE-7966

          Show
          thetaphi Uwe Schindler added a comment - Hi, I took Robert's latest patch, committed it to my Github fork and developed it a bit more: I added a stub generator (that can only be used with Java 9) to generate stub classes that only have the Java 9 signatures, but no private stuff or method bodies. This is done by a groovy script, that can be executed (requires Java 9 as JAVA_HOME): ant generate-java9-stubs . The files generated by this are committed, so nobody needs java 9 to compile lucene. The license should be no problem, as no code is involved. This is mentioned in the README.txt in the folder. The stub classes are compiled with class version of Java 8. When compiling the MR-JAR, the standard Java compiler is used, but we prepend our stub classes to the bootclasspath of javac. This allows us to compile the MR-JAR file also with Java 8. If Java 9 is detected, we don't add anything to bootclasspath. The branch is here: https://github.com/apache/lucene-solr/compare/master...uschindler:jira/LUCENE-7966
          Hide
          thetaphi Uwe Schindler added a comment - - edited
          Show
          thetaphi Uwe Schindler added a comment - - edited This is just my patch on top of Robert's: https://github.com/apache/lucene-solr/commit/e69d77023fbfc60bb0baf16dc5102ab76419cf03
          Hide
          mikemccand Michael McCandless added a comment -

          > Mike, is it possible the benchmark didn't warm up here (or maybe something happened with the 7.x backport?).

          I'm not sure what happened ... the bench should have been "hot": plenty of RAM on the box, and I ran each case twice. I did also run in a virtual env (EC2), i3.16xlarge instance; maybe a noisy neighbor impacted results? I don't think we should let this block committing; the change otherwise seems awesome.

          Show
          mikemccand Michael McCandless added a comment - > Mike, is it possible the benchmark didn't warm up here (or maybe something happened with the 7.x backport?). I'm not sure what happened ... the bench should have been "hot": plenty of RAM on the box, and I ran each case twice. I did also run in a virtual env (EC2), i3.16xlarge instance; maybe a noisy neighbor impacted results? I don't think we should let this block committing; the change otherwise seems awesome.
          Hide
          thetaphi Uwe Schindler added a comment -

          Michael McCandless: A stupid question: Did you do the benchmark on Java 9 using the JAR file? If you did it with the class-files only classpath, it won't use any Java 9 features, so you won't see any speed improvement. MR-JAR files require to use them as JAR files. Just placing the files in META-INF subdirectories of a file-only classpath won't use them!

          Show
          thetaphi Uwe Schindler added a comment - Michael McCandless : A stupid question: Did you do the benchmark on Java 9 using the JAR file? If you did it with the class-files only classpath, it won't use any Java 9 features, so you won't see any speed improvement. MR-JAR files require to use them as JAR files. Just placing the files in META-INF subdirectories of a file-only classpath won't use them!
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          When thinking last night about the whole thing a bit more, I had a cool idea: Currently we use ASM to generate the stub files to compile against (see my Github repo). On top of these stubs we use a "wrapper class" that just delegates all methods to the Java 9 one. IMHO, this is not nice for the optimizer (although it can handle that). But the oal.future.FutureObjects/FutureArrays classes just contain the same signatures as their Java 9 variants would contain. So my idea is to use ASM to patch all classes:

          • Use a groovy script that runs on the compiler output, before building the JAR file
          • Load class with ASM and use ASM's rewriter functionality to change the classname of all occurences of oal.future.FutureObjects/FutureArrays and replace them by java.util.Objects/Arrays. We can use this utility out of ASM to do this: http://asm.ow2.org/asm50/javadoc/user/org/objectweb/asm/commons/ClassRemapper.html. Whenever a class file contaisn references to FutureXXX classes, we patch it using asm and write it out to META-INF folder as Java 9 variant.
          • Then package MR jar.

          The good thing:

          • we don't need stub files to compile with Java 8. We just need the smoke tester to verify the patched class files actually resolves against Java 9 during the Java 9 checks
          • we have no license issues, because we don't need to generate and commit the stubs. In our source files we solely use oal.future.Objects/Arrays. Adapting to Java 9 is done by constant pool renaming

          What do you think? I will try this variant a bit later today. We can use the same approach for other Java 9 classes, too! Maybe this also helps with the issues Mike has seen (I am not happy to have the degelator class).

          Show
          thetaphi Uwe Schindler added a comment - - edited When thinking last night about the whole thing a bit more, I had a cool idea: Currently we use ASM to generate the stub files to compile against (see my Github repo). On top of these stubs we use a "wrapper class" that just delegates all methods to the Java 9 one. IMHO, this is not nice for the optimizer (although it can handle that). But the oal.future.FutureObjects/FutureArrays classes just contain the same signatures as their Java 9 variants would contain. So my idea is to use ASM to patch all classes: Use a groovy script that runs on the compiler output, before building the JAR file Load class with ASM and use ASM's rewriter functionality to change the classname of all occurences of oal.future.FutureObjects/FutureArrays and replace them by java.util.Objects/Arrays. We can use this utility out of ASM to do this: http://asm.ow2.org/asm50/javadoc/user/org/objectweb/asm/commons/ClassRemapper.html . Whenever a class file contaisn references to FutureXXX classes, we patch it using asm and write it out to META-INF folder as Java 9 variant. Then package MR jar. The good thing: we don't need stub files to compile with Java 8. We just need the smoke tester to verify the patched class files actually resolves against Java 9 during the Java 9 checks we have no license issues, because we don't need to generate and commit the stubs. In our source files we solely use oal.future.Objects/Arrays. Adapting to Java 9 is done by constant pool renaming What do you think? I will try this variant a bit later today. We can use the same approach for other Java 9 classes, too! Maybe this also helps with the issues Mike has seen (I am not happy to have the degelator class).
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          Here is the class remapper: https://paste.apache.org/bAzx

          Basically it rewrites all references to oal.future.FutureXxxx to the Java 9 type java.util.Xxxx. All files that contain in the Java 8 code in build/classes/java references to our own FutureXxx classes (the remapper sets remapped=true for those) are saved to in rewritten formto a separate directory build/classes/java9 in parallel to the original and are packaged into the multirelease part of the JAR. All classes that have no references to our FutureXxx backports are kept out.

          This can be done as a general task and may be applied to all Lucene/Solr modules.

          I will update my branch later.

          Show
          thetaphi Uwe Schindler added a comment - - edited Here is the class remapper: https://paste.apache.org/bAzx Basically it rewrites all references to oal.future.FutureXxxx to the Java 9 type java.util.Xxxx. All files that contain in the Java 8 code in build/classes/java references to our own FutureXxx classes (the remapper sets remapped=true for those) are saved to in rewritten formto a separate directory build/classes/java9 in parallel to the original and are packaged into the multirelease part of the JAR. All classes that have no references to our FutureXxx backports are kept out. This can be done as a general task and may be applied to all Lucene/Solr modules. I will update my branch later.
          Hide
          mikemccand Michael McCandless added a comment -

          Michael McCandless: A stupid question: Did you do the benchmark on Java 9 using the JAR file? If you did it with the class-files only classpath, it won't use any Java 9 features, so you won't see any speed improvement. MR-JAR files require to use them as JAR files. Just placing the files in META-INF subdirectories of a file-only classpath won't use them!

          That was a great idea Uwe Schindler but alas I was using Lucene via JAR files.

          Show
          mikemccand Michael McCandless added a comment - Michael McCandless: A stupid question: Did you do the benchmark on Java 9 using the JAR file? If you did it with the class-files only classpath, it won't use any Java 9 features, so you won't see any speed improvement. MR-JAR files require to use them as JAR files. Just placing the files in META-INF subdirectories of a file-only classpath won't use them! That was a great idea Uwe Schindler but alas I was using Lucene via JAR files.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          I implemented by latest idea based again on Robert's patch: https://github.com/apache/lucene-solr/compare/master...uschindler:jira/LUCENE-7966-v2

          This approach is much more clean: We compile against Robert's replacement classes FutureObjects and FutureArrays (that have to contain the same method signatures as the Java 9 original, but we can add a test for this later with smoketester) as usual with Java 8. Before packaging the JAR file we read all class files and patch all FutureObjects/FutureArrays references to refer to the Java 9 class. The patched output is sent to a separate folder build/classes/java9. The JAR file is then packaged to take both variants, placing the patched ones in the Java 9 MultiRelease part.

          Currently only the lucene-core.jar file uses the patched stuff, so stuff outside lucene-core (e.g., codecs) does not yet automatically add Java 9 variants, instead it will use Robert's classes. If this is the way to go, I will move the patcher to the global tools directory and we can apply patching to all JAR files of the distribution. WARNING: We cannot support Maven builds here, Maven always builds a Java8-only JAR file!

          Michael McCandless, Adrien Grand: Could you build a lucene-core.jar file with the above branch on Github and do your tests again? The main difference here is that the JAR file no longer contains a delegator class. Instead all class files that were originally compiled with FutureObjects/FutureArrays (for Java 8 support) are patched to directly use the Java 9 Arrays/Objects methods, without using a delegator class. Keep in mind: This currently only support lucene-core.jar, the codecs JAR file is not yet Multirelease with this patch.

          When building with ant jar inside lucene/core you should see output like this:

               [compile shit...................]
               [copy] Copying 3 files to C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\classes\java
          
          -mrjar-classes-uptodate:
          
          resolve-groovy:
          [ivy:cachepath] :: resolving dependencies :: org.codehaus.groovy#groovy-all-caller;working
          [ivy:cachepath]         confs: [default]
          [ivy:cachepath]         found org.codehaus.groovy#groovy-all;2.4.8 in public
          [ivy:cachepath] :: resolution report :: resolve 170ms :: artifacts dl 5ms
                  ---------------------------------------------------------------------
                  |                  |            modules            ||   artifacts   |
                  |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
                  ---------------------------------------------------------------------
                  |      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
                  ---------------------------------------------------------------------
          
          patch-mrjar-classes:
          [ivy:cachepath] :: resolving dependencies :: org.ow2.asm#asm-commons-caller;working
          [ivy:cachepath]         confs: [default]
          [ivy:cachepath]         found org.ow2.asm#asm-commons;5.1 in public
          [ivy:cachepath]         found org.ow2.asm#asm-tree;5.1 in public
          [ivy:cachepath]         found org.ow2.asm#asm;5.1 in public
          [ivy:cachepath] :: resolution report :: resolve 701ms :: artifacts dl 8ms
                  ---------------------------------------------------------------------
                  |                  |            modules            ||   artifacts   |
                  |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
                  ---------------------------------------------------------------------
                  |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
                  ---------------------------------------------------------------------
             [groovy] Remapped: org/apache/lucene/analysis/tokenattributes/CharTermAttributeImpl
             [groovy] Remapped: org/apache/lucene/codecs/compressing/LZ4
             [groovy] Remapped: org/apache/lucene/document/BinaryPoint$2
             [groovy] Remapped: org/apache/lucene/document/DoubleRange
             [groovy] Remapped: org/apache/lucene/document/FloatRange
             [groovy] Remapped: org/apache/lucene/document/IntRange
             [groovy] Remapped: org/apache/lucene/document/LongRange
             [groovy] Remapped: org/apache/lucene/index/BitsSlice
             [groovy] Remapped: org/apache/lucene/index/CodecReader
             [groovy] Remapped: org/apache/lucene/index/MergeReaderWrapper
             [groovy] Remapped: org/apache/lucene/search/BooleanScorer$TailPriorityQueue
             [groovy] Remapped: org/apache/lucene/util/BytesRef
             [groovy] Remapped: org/apache/lucene/util/BytesRefArray
             [groovy] Remapped: org/apache/lucene/util/CharsRef$UTF16SortedAsUTF8Comparator
             [groovy] Remapped: org/apache/lucene/util/CharsRef
             [groovy] Remapped: org/apache/lucene/util/IntsRef
             [groovy] Remapped: org/apache/lucene/util/LongsRef
             [groovy] Remapped: org/apache/lucene/util/StringHelper
             [groovy] Remapped: org/apache/lucene/util/automaton/Automaton$Builder
             [groovy] Remapped: org/apache/lucene/util/automaton/Automaton
             [groovy] Remapped 20 class files for Java 9 to: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\classes\java9
              [touch] Creating C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\patch-mrjar.stamp
          
          jar-core:
                [jar] Building jar: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\lucene-core-8.0.0-SNAPSHOT.jar
          
          jar:
          
          BUILD SUCCESSFUL
          Total time: 31 seconds
          

          This patch also adds uptodate support, so the pathing is only done if the original class files change.

          Show
          thetaphi Uwe Schindler added a comment - - edited I implemented by latest idea based again on Robert's patch: https://github.com/apache/lucene-solr/compare/master...uschindler:jira/LUCENE-7966-v2 This approach is much more clean: We compile against Robert's replacement classes FutureObjects and FutureArrays (that have to contain the same method signatures as the Java 9 original, but we can add a test for this later with smoketester) as usual with Java 8. Before packaging the JAR file we read all class files and patch all FutureObjects/FutureArrays references to refer to the Java 9 class. The patched output is sent to a separate folder build/classes/java9 . The JAR file is then packaged to take both variants, placing the patched ones in the Java 9 MultiRelease part. Currently only the lucene-core.jar file uses the patched stuff, so stuff outside lucene-core (e.g., codecs) does not yet automatically add Java 9 variants, instead it will use Robert's classes. If this is the way to go, I will move the patcher to the global tools directory and we can apply patching to all JAR files of the distribution. WARNING: We cannot support Maven builds here, Maven always builds a Java8-only JAR file! Michael McCandless , Adrien Grand : Could you build a lucene-core.jar file with the above branch on Github and do your tests again? The main difference here is that the JAR file no longer contains a delegator class. Instead all class files that were originally compiled with FutureObjects/FutureArrays (for Java 8 support) are patched to directly use the Java 9 Arrays/Objects methods, without using a delegator class. Keep in mind: This currently only support lucene-core.jar, the codecs JAR file is not yet Multirelease with this patch. When building with ant jar inside lucene/core you should see output like this: [compile shit...................] [copy] Copying 3 files to C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\classes\java -mrjar-classes-uptodate: resolve-groovy: [ivy:cachepath] :: resolving dependencies :: org.codehaus.groovy#groovy-all-caller;working [ivy:cachepath] confs: [default] [ivy:cachepath] found org.codehaus.groovy#groovy-all;2.4.8 in public [ivy:cachepath] :: resolution report :: resolve 170ms :: artifacts dl 5ms --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 0 | 0 | 0 || 1 | 0 | --------------------------------------------------------------------- patch-mrjar-classes: [ivy:cachepath] :: resolving dependencies :: org.ow2.asm#asm-commons-caller;working [ivy:cachepath] confs: [default] [ivy:cachepath] found org.ow2.asm#asm-commons;5.1 in public [ivy:cachepath] found org.ow2.asm#asm-tree;5.1 in public [ivy:cachepath] found org.ow2.asm#asm;5.1 in public [ivy:cachepath] :: resolution report :: resolve 701ms :: artifacts dl 8ms --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 3 | 0 | 0 | 0 || 3 | 0 | --------------------------------------------------------------------- [groovy] Remapped: org/apache/lucene/analysis/tokenattributes/CharTermAttributeImpl [groovy] Remapped: org/apache/lucene/codecs/compressing/LZ4 [groovy] Remapped: org/apache/lucene/document/BinaryPoint$2 [groovy] Remapped: org/apache/lucene/document/DoubleRange [groovy] Remapped: org/apache/lucene/document/FloatRange [groovy] Remapped: org/apache/lucene/document/IntRange [groovy] Remapped: org/apache/lucene/document/LongRange [groovy] Remapped: org/apache/lucene/index/BitsSlice [groovy] Remapped: org/apache/lucene/index/CodecReader [groovy] Remapped: org/apache/lucene/index/MergeReaderWrapper [groovy] Remapped: org/apache/lucene/search/BooleanScorer$TailPriorityQueue [groovy] Remapped: org/apache/lucene/util/BytesRef [groovy] Remapped: org/apache/lucene/util/BytesRefArray [groovy] Remapped: org/apache/lucene/util/CharsRef$UTF16SortedAsUTF8Comparator [groovy] Remapped: org/apache/lucene/util/CharsRef [groovy] Remapped: org/apache/lucene/util/IntsRef [groovy] Remapped: org/apache/lucene/util/LongsRef [groovy] Remapped: org/apache/lucene/util/StringHelper [groovy] Remapped: org/apache/lucene/util/automaton/Automaton$Builder [groovy] Remapped: org/apache/lucene/util/automaton/Automaton [groovy] Remapped 20 class files for Java 9 to: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\classes\java9 [touch] Creating C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\patch-mrjar.stamp jar-core: [jar] Building jar: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\lucene-core-8.0.0-SNAPSHOT.jar jar: BUILD SUCCESSFUL Total time: 31 seconds This patch also adds uptodate support, so the pathing is only done if the original class files change.
          Hide
          rcmuir Robert Muir added a comment -

          WARNING: We cannot support Maven builds here, Maven always builds a Java8-only JAR file!

          Why exactly is this the case? with my patch maven should work. Maven just uses the jars produced by ant. The smoketester validates they are exactly the same.

          Show
          rcmuir Robert Muir added a comment - WARNING: We cannot support Maven builds here, Maven always builds a Java8-only JAR file! Why exactly is this the case? with my patch maven should work. Maven just uses the jars produced by ant. The smoketester validates they are exactly the same.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          Why exactly is this the case? with my patch maven should work. Maven just uses the jars produced by ant. The smoketester validates they are exactly the same.

          Maven works when you build the JAR files for Maven with our ANT targets. What does not work is the pom.xml build generated by sarowes maven build. That one will build JAR files only with the FutureXxx stuff, but no multirelease stuff. Sorry for being imprecise.

          Show
          thetaphi Uwe Schindler added a comment - - edited Why exactly is this the case? with my patch maven should work. Maven just uses the jars produced by ant. The smoketester validates they are exactly the same. Maven works when you build the JAR files for Maven with our ANT targets. What does not work is the pom.xml build generated by sarowes maven build. That one will build JAR files only with the FutureXxx stuff, but no multirelease stuff. Sorry for being imprecise.
          Hide
          rcmuir Robert Muir added a comment -

          thanks for the clarification. yes, that's no change from my patch.

          Show
          rcmuir Robert Muir added a comment - thanks for the clarification. yes, that's no change from my patch.
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Uwe Schindler; I'll retest from your branch!

          Show
          mikemccand Michael McCandless added a comment - Thanks Uwe Schindler ; I'll retest from your branch!
          Hide
          mikemccand Michael McCandless added a comment -

          I retested w/ Uwe's patch:

          java 8 w/o uwe's patch
            done init top-level VE state; took 150.849s; 1062.73 MB RAM; 94244084 total unique families
            done init top-level VE state; took 151.488s; 1062.73 MB RAM; 94244084 total unique families
          
          java 8 w/ uwe's patch
            done init top-level VE state; took 156.910s; 1062.73 MB RAM; 94244084 total unique families
            done init top-level VE state; took 154.836s; 1062.73 MB RAM; 94244084 total unique families
          
          java 9 w/o uwe's patch
            done init top-level VE state; took 151.883s; 1062.73 MB RAM; 94244084 total unique families
            done init top-level VE state; took 151.123s; 1062.73 MB RAM; 94244084 total unique families
          
          java 9 w/ uwe's patch
            done init top-level VE state; took 159.383s; 1062.73 MB RAM; 94244084 total unique families
            done init top-level VE state; took 156.662s; 1062.73 MB RAM; 94244084 total unique families
          

          Still not sure what's going on ...

          Show
          mikemccand Michael McCandless added a comment - I retested w/ Uwe's patch: java 8 w/o uwe's patch done init top-level VE state; took 150.849s; 1062.73 MB RAM; 94244084 total unique families done init top-level VE state; took 151.488s; 1062.73 MB RAM; 94244084 total unique families java 8 w/ uwe's patch done init top-level VE state; took 156.910s; 1062.73 MB RAM; 94244084 total unique families done init top-level VE state; took 154.836s; 1062.73 MB RAM; 94244084 total unique families java 9 w/o uwe's patch done init top-level VE state; took 151.883s; 1062.73 MB RAM; 94244084 total unique families done init top-level VE state; took 151.123s; 1062.73 MB RAM; 94244084 total unique families java 9 w/ uwe's patch done init top-level VE state; took 159.383s; 1062.73 MB RAM; 94244084 total unique families done init top-level VE state; took 156.662s; 1062.73 MB RAM; 94244084 total unique families Still not sure what's going on ...
          Hide
          thetaphi Uwe Schindler added a comment -

          No idea either!

          Show
          thetaphi Uwe Schindler added a comment - No idea either!
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi Michael McCandless,
          I did some more testing and did not figure out a bug in the MR jar that could explain this. I did the following:

          I created a simple JAVA class file:

          import org.apache.lucene.util.BytesRef;
          
          public class Test {
            public static void main(String... args) {
              BytesRef r1 = new BytesRef(new byte[20], 0, 30);
              BytesRef r2 = new BytesRef(new byte[20], 1, 10);
              r1.compareTo(r2);
            }
          }
          

          Of course this code is buggy, but that was the idea here. If I run this with Java 8, I get following output:

          $ java -cp lucene-core-8.0.0-SNAPSHOT.jar;. Test
          Exception in thread "main" java.lang.IndexOutOfBoundsException: Range [0, 30) out-of-bounds for length 20
                  at org.apache.lucene.future.FutureArrays.checkFromToIndex(FutureArrays.java:35)
                  at org.apache.lucene.future.FutureArrays.compareUnsigned(FutureArrays.java:62)
                  at org.apache.lucene.util.BytesRef.compareTo(BytesRef.java:165)
                  at Test.main(Test.java:7)
          

          If I run the same with Java 9, I get the following output:

          $ java -cp lucene-core-8.0.0-SNAPSHOT.jar;. Test
          Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 30
                  at java.base/java.util.Arrays.rangeCheck(Arrays.java:122)
                  at java.base/java.util.Arrays.compareUnsigned(Arrays.java:6101)
                  at org.apache.lucene.util.BytesRef.compareTo(BytesRef.java:165)
                  at Test.main(Test.java:7)
          

          As we see, BytesRef class is using Java9's native java.util.Arrays class instead our own FutureArrays.

          You can also verify this with javap:

          $ javap --multi-release 9 -cp lucene-core-8.0.0-SNAPSHOT.jar -p -c org.apache.lucene.util.BytesRef  | grep Arrays
                34: invokestatic  #76                 // Method java/util/Arrays.equals:([BII[BII)Z
                34: invokestatic  #136                // Method java/util/Arrays.compareUnsigned:([BII[BII)I
                26: invokestatic  #143                // Method java/util/Arrays.copyOfRange:([BII)[B
          
          $ javap --multi-release 8 -cp lucene-core-8.0.0-SNAPSHOT.jar -p -c org.apache.lucene.util.BytesRef  | grep Arrays
                34: invokestatic  #15                 // Method org/apache/lucene/future/FutureArrays.equals:([BII[BII)Z
                34: invokestatic  #29                 // Method org/apache/lucene/future/FutureArrays.compareUnsigned:([BII[BII)I
                26: invokestatic  #31                 // Method java/util/Arrays.copyOfRange:([BII)[B
          

          So the setup is correct.

          Can you maybe also check with the above "Test" program that the JAR file you are using behaves correctly?

          Nevertheless and unrelated to Mike's problems seing an improvement here, we need to improve the whole thing:

          • Our tests are currently always running against our own FutureArrays/FutureObjects variants, because we run tests against the file system based class path and not the JAR file. So Java 9 won't see our patched classes while running tests. I am curretly thinking about how to improve this.
          • We should maybe add a smoke tester check using my above "buggy Test class" to figure out that our MR JAR works. We should also improve Smoker to use Java 9 in addition to Java 8 (like we did it at Java 7 times).
          • We should try check our MR JAR. Currently if you mix up FutureArrays or FutureObjects to have different method signatures, it would fail in Java 9. So we have to ensure the exported methods in our own impls match the original.

          I am still waiting for Adrien Grand to see his benchmark results with the new MR JAR.

          Show
          thetaphi Uwe Schindler added a comment - Hi Michael McCandless , I did some more testing and did not figure out a bug in the MR jar that could explain this. I did the following: I created a simple JAVA class file: import org.apache.lucene.util.BytesRef; public class Test { public static void main( String ... args) { BytesRef r1 = new BytesRef( new byte [20], 0, 30); BytesRef r2 = new BytesRef( new byte [20], 1, 10); r1.compareTo(r2); } } Of course this code is buggy, but that was the idea here. If I run this with Java 8, I get following output: $ java -cp lucene-core-8.0.0-SNAPSHOT.jar;. Test Exception in thread "main" java.lang.IndexOutOfBoundsException: Range [0, 30) out-of-bounds for length 20 at org.apache.lucene.future.FutureArrays.checkFromToIndex(FutureArrays.java:35) at org.apache.lucene.future.FutureArrays.compareUnsigned(FutureArrays.java:62) at org.apache.lucene.util.BytesRef.compareTo(BytesRef.java:165) at Test.main(Test.java:7) If I run the same with Java 9, I get the following output: $ java -cp lucene-core-8.0.0-SNAPSHOT.jar;. Test Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 30 at java.base/java.util.Arrays.rangeCheck(Arrays.java:122) at java.base/java.util.Arrays.compareUnsigned(Arrays.java:6101) at org.apache.lucene.util.BytesRef.compareTo(BytesRef.java:165) at Test.main(Test.java:7) As we see, BytesRef class is using Java9's native java.util.Arrays class instead our own FutureArrays. You can also verify this with javap : $ javap --multi-release 9 -cp lucene-core-8.0.0-SNAPSHOT.jar -p -c org.apache.lucene.util.BytesRef | grep Arrays 34: invokestatic #76 // Method java/util/Arrays.equals:([BII[BII)Z 34: invokestatic #136 // Method java/util/Arrays.compareUnsigned:([BII[BII)I 26: invokestatic #143 // Method java/util/Arrays.copyOfRange:([BII)[B $ javap --multi-release 8 -cp lucene-core-8.0.0-SNAPSHOT.jar -p -c org.apache.lucene.util.BytesRef | grep Arrays 34: invokestatic #15 // Method org/apache/lucene/future/FutureArrays.equals:([BII[BII)Z 34: invokestatic #29 // Method org/apache/lucene/future/FutureArrays.compareUnsigned:([BII[BII)I 26: invokestatic #31 // Method java/util/Arrays.copyOfRange:([BII)[B So the setup is correct. Can you maybe also check with the above "Test" program that the JAR file you are using behaves correctly? Nevertheless and unrelated to Mike's problems seing an improvement here, we need to improve the whole thing: Our tests are currently always running against our own FutureArrays/FutureObjects variants, because we run tests against the file system based class path and not the JAR file. So Java 9 won't see our patched classes while running tests. I am curretly thinking about how to improve this. We should maybe add a smoke tester check using my above "buggy Test class" to figure out that our MR JAR works. We should also improve Smoker to use Java 9 in addition to Java 8 (like we did it at Java 7 times). We should try check our MR JAR. Currently if you mix up FutureArrays or FutureObjects to have different method signatures, it would fail in Java 9. So we have to ensure the exported methods in our own impls match the original. I am still waiting for Adrien Grand to see his benchmark results with the new MR JAR.
          Hide
          jpountz Adrien Grand added a comment -

          Sorry Uwe, I was on vacation while you pinged me and did not notice you were asking me something. I'll run the benchmark again today or tomorrow.

          Show
          jpountz Adrien Grand added a comment - Sorry Uwe, I was on vacation while you pinged me and did not notice you were asking me something. I'll run the benchmark again today or tomorrow.
          Hide
          thetaphi Uwe Schindler added a comment -

          No problem!

          Show
          thetaphi Uwe Schindler added a comment - No problem!
          Hide
          jpountz Adrien Grand added a comment -

          Here are the results:

          File Time to compress without patch Time to compress with the patch Difference
          bib 976672 935724 -4.2%
          book1 7487574 7300402 -2.5%
          book2 4999043 4719148 -5.6%
          geo 1598629 1648771 3.1%
          news 3398425 3289187 -3.2%
          obj1 169515 167649 -1.1%
          obj2 1873220 1804144 -3.7%
          paper1 386047 364133 -5.7%
          paper2 726156 695761 -4.2%
          paper3 375937 359946 -4.3%
          paper4 111643 108169 -3.1%
          paper5 99465 96254 -3.2%
          paper6 277771 263810 -5.0%
          pic 1528434 1324520 -13.3%
          progc 279360 265820 -4.8%
          progl 411285 385095 -6.4%
          progp 246035 226135 -8.1%
          trans 517765 485948 -6.1%

          Results are slightly less good than previously but it could well be noise. It's still an improvement compared to master.

          Show
          jpountz Adrien Grand added a comment - Here are the results: File Time to compress without patch Time to compress with the patch Difference bib 976672 935724 -4.2% book1 7487574 7300402 -2.5% book2 4999043 4719148 -5.6% geo 1598629 1648771 3.1% news 3398425 3289187 -3.2% obj1 169515 167649 -1.1% obj2 1873220 1804144 -3.7% paper1 386047 364133 -5.7% paper2 726156 695761 -4.2% paper3 375937 359946 -4.3% paper4 111643 108169 -3.1% paper5 99465 96254 -3.2% paper6 277771 263810 -5.0% pic 1528434 1324520 -13.3% progc 279360 265820 -4.8% progl 411285 385095 -6.4% progp 246035 226135 -8.1% trans 517765 485948 -6.1% Results are slightly less good than previously but it could well be noise. It's still an improvement compared to master.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          Thanks Adrien!

          Results are slightly less good than previously but it could well be noise.

          This can only be noise, because the code we are running is the same as Robert's. I just removed the additional indirection in the Java 9 code path by patching the class files directly.

          I hope you are still ready: Can you also run this comparison with Java 8? I assume the current numbers are with Java 9. Just to be sure that our "emulation layer" does not slowdown Java 8.

          Show
          thetaphi Uwe Schindler added a comment - - edited Thanks Adrien! Results are slightly less good than previously but it could well be noise. This can only be noise, because the code we are running is the same as Robert's. I just removed the additional indirection in the Java 9 code path by patching the class files directly. I hope you are still ready: Can you also run this comparison with Java 8? I assume the current numbers are with Java 9. Just to be sure that our "emulation layer" does not slowdown Java 8.
          Hide
          jpountz Adrien Grand added a comment -

          Sure, let me run that.

          Show
          jpountz Adrien Grand added a comment - Sure, let me run that.
          Hide
          jpountz Adrien Grand added a comment - - edited

          Now with Java 8

          File Time to compress without patch Time to compress with the patch Difference
          bib 1033357 1027901 -0.5%
          book1 7906551 7961422 +0.7%
          book2 5335458 5348388 +0.2%
          geo 1629253 1644770 +1.0%
          news 3569115 3609734 +1.1%
          obj1 176601 178634 +1.2%
          obj2 1970078 1993204 +1.2%
          paper1 412884 409665 -0.8%
          paper2 772086 772052 -0.0%
          paper3 401949 397526 -1.1%
          paper4 118664 117747 -0.8%
          paper5 104995 104855 -0.1%
          paper6 296315 295813 -0.2%
          pic 1619648 1698345 +4.9%
          progc 299270 298341 -0.3%
          progl 440063 443714 +0.8%
          progp 265607 266388 +0.3%
          trans 554452 555335 +0.2%

          The slowdown on pic (the most compressible file) is reproducible but other files show very similar performance with and without the patch.

          Show
          jpountz Adrien Grand added a comment - - edited Now with Java 8 File Time to compress without patch Time to compress with the patch Difference bib 1033357 1027901 -0.5% book1 7906551 7961422 +0.7% book2 5335458 5348388 +0.2% geo 1629253 1644770 +1.0% news 3569115 3609734 +1.1% obj1 176601 178634 +1.2% obj2 1970078 1993204 +1.2% paper1 412884 409665 -0.8% paper2 772086 772052 -0.0% paper3 401949 397526 -1.1% paper4 118664 117747 -0.8% paper5 104995 104855 -0.1% paper6 296315 295813 -0.2% pic 1619648 1698345 +4.9% progc 299270 298341 -0.3% progl 440063 443714 +0.8% progp 265607 266388 +0.3% trans 554452 555335 +0.2% The slowdown on pic (the most compressible file) is reproducible but other files show very similar performance with and without the patch.
          Hide
          dweiss Dawid Weiss added a comment -

          I keep thinking: is this static replacement of those delegation methods actually visible in performance benchmarks? I'd think once those methods go hot they'd be inlined when their enclosing methods are compiled without a trace of performance degradation? Just wondering.

          Show
          dweiss Dawid Weiss added a comment - I keep thinking: is this static replacement of those delegation methods actually visible in performance benchmarks? I'd think once those methods go hot they'd be inlined when their enclosing methods are compiled without a trace of performance degradation? Just wondering.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          Dawid Weiss: The patched class files are actually easier to maintain, as we do not need Java 9 to compile, no duplicate class files in source folder, or some fake Java 9 signature files (with questionable license) on bootclasspath (see my previous branch). This was the main reason to rewrite the class files instead of maintaining multiple source files. It's just a nice side-effect to no longer need the delegation methods. So I personally like the patching approach much more, as it's well tested by the ASM maintainers (we just use their code, no custom impl). It would be horrible if we'd instead require all committers to have both Java 8 and Java 9 installed!

          The question here was just for confirmation and comparison of both approaches, if they have some side effects.

          The slowdown on pic (the most compressible file) is reproducible

          Adrien Grand: The one with biggest slowdown on Java 8 is the one with biggest speedup in Java 9. The reason is quite clear: The Java 8 implementation by Robert does more checks than the "old" LZ4 implementation (for safety and to be compatible with new Java 9 impl). But on Java 9 the new method used is an intrinsic, so we have a huge perf win!

          Show
          thetaphi Uwe Schindler added a comment - - edited Dawid Weiss : The patched class files are actually easier to maintain, as we do not need Java 9 to compile, no duplicate class files in source folder, or some fake Java 9 signature files (with questionable license) on bootclasspath (see my previous branch). This was the main reason to rewrite the class files instead of maintaining multiple source files. It's just a nice side-effect to no longer need the delegation methods. So I personally like the patching approach much more, as it's well tested by the ASM maintainers (we just use their code, no custom impl). It would be horrible if we'd instead require all committers to have both Java 8 and Java 9 installed! The question here was just for confirmation and comparison of both approaches, if they have some side effects. The slowdown on pic (the most compressible file) is reproducible Adrien Grand : The one with biggest slowdown on Java 8 is the one with biggest speedup in Java 9. The reason is quite clear: The Java 8 implementation by Robert does more checks than the "old" LZ4 implementation (for safety and to be compatible with new Java 9 impl). But on Java 9 the new method used is an intrinsic, so we have a huge perf win!
          Hide
          dweiss Dawid Weiss added a comment -

          Right, thanks Uwe.

          Show
          dweiss Dawid Weiss added a comment - Right, thanks Uwe.
          Hide
          jpountz Adrien Grand added a comment -

          I had a closer look at the branch and like the patching approach. Should we modify the smoke tester at the same time to enforce that both Java 8 and 9 are tested?

          Show
          jpountz Adrien Grand added a comment - I had a closer look at the branch and like the patching approach. Should we modify the smoke tester at the same time to enforce that both Java 8 and 9 are tested?
          Hide
          thetaphi Uwe Schindler added a comment -

          I had a closer look at the branch and like the patching approach. Should we modify the smoke tester at the same time to enforce that both Java 8 and 9 are tested?

          I think so - that was my plan! In general the testing is automatically done. As soon as you start Lucene Demo or Solr with Java 9 it will test the JAR file. But a separate test might be good (like the Exception test I posted before), to see if stack trace looks as expected.

          I will work soon on changing the patching mechanism to be globally (not only in root module). I would also like to remove the @Deprecated from the Future classes (because at this time, they are the only way) and instead add @lucene.internal. We should add a separate issue about renoving Future classes, once we swicth to Java 9.

          Are there any other tests we should do. I talked with Robert - we both don't understand Mike's findings. I don't trust them unless we have a reproducible benchmark using BytesRefHash and similar. The improvements in LZ4 are phantastic, I would have expected the same from BytesRefHash.

          Show
          thetaphi Uwe Schindler added a comment - I had a closer look at the branch and like the patching approach. Should we modify the smoke tester at the same time to enforce that both Java 8 and 9 are tested? I think so - that was my plan! In general the testing is automatically done. As soon as you start Lucene Demo or Solr with Java 9 it will test the JAR file. But a separate test might be good (like the Exception test I posted before), to see if stack trace looks as expected. I will work soon on changing the patching mechanism to be globally (not only in root module). I would also like to remove the @Deprecated from the Future classes (because at this time, they are the only way) and instead add @lucene.internal . We should add a separate issue about renoving Future classes, once we swicth to Java 9. Are there any other tests we should do. I talked with Robert - we both don't understand Mike's findings. I don't trust them unless we have a reproducible benchmark using BytesRefHash and similar. The improvements in LZ4 are phantastic, I would have expected the same from BytesRefHash.

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development