Details

    • Type: Task
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7
    • Fix Version/s: 1.15
    • Component/s: None
    • Labels:
      None
    • Flags:
      Patch

      Description

      Currently the Class file parser uses ASM 4.1. This older version cannot read Java 8 / Java 9 class files (fails with Exception).

      The upgrade to ASM 5.0.4 is very simple, just Maven dependency change. The code change is only to update the visitor version, so it gets new Java 8 features like lambdas reported, but this is not really required, but should be done for full support.

      FYI, in LUCENE-6729 we want to upgrade the Lucene Expressions module to ASM 5, too.

      You can hot-swap ASM 4.1 with ASM 5.0.4 without recompilation (so we have no problem with Lucene using a newer version). Since ASM 4.x the updates are more easy (no visitor interfaces anymore, instead abstract classes), so it does not break if you just replace the JAR file. So just see this as a recommendatation, not urgent! Solr/Lucene will also work without this patch (it just replaces the shipped ASM by newer version in our packaging).

      1. TIKA-1705.patch
        1 kB
        Uwe Schindler
      2. TIKA-1705-2.patch
        0.4 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          thetaphi Uwe Schindler added a comment -

          Simple patch. All tests pass.

          Show
          thetaphi Uwe Schindler added a comment - Simple patch. All tests pass.
          Hide
          davemeikle Dave Meikle added a comment -

          Committed in r1695177. Thanks Uwe Schindler!

          Show
          davemeikle Dave Meikle added a comment - Committed in r1695177. Thanks Uwe Schindler !
          Hide
          davemeikle Dave Meikle added a comment -

          Fixed committed in r1695177.

          Show
          davemeikle Dave Meikle added a comment - Fixed committed in r1695177.
          Hide
          gagravarr Nick Burch added a comment -

          I notice we only have a Java 5 test file in tika-parsers/src/test/resources/test-documents - is it worth adding classes from each major version with associated unit tests to verify this works properly + doesn't get broken in future?

          Show
          gagravarr Nick Burch added a comment - I notice we only have a Java 5 test file in tika-parsers/src/test/resources/test-documents - is it worth adding classes from each major version with associated unit tests to verify this works properly + doesn't get broken in future?
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in tika-trunk-jdk1.7 #818 (See https://builds.apache.org/job/tika-trunk-jdk1.7/818/)
          TIKA-1705: Upgraded ASM to 5.0.4. Patch from Uwe Schindler. (dmeikle: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1695177)

          • /tika/trunk/CHANGES.txt
          • /tika/trunk/tika-parsers/pom.xml
          • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/asm/XHTMLClassVisitor.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #818 (See https://builds.apache.org/job/tika-trunk-jdk1.7/818/ ) TIKA-1705 : Upgraded ASM to 5.0.4. Patch from Uwe Schindler. (dmeikle: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1695177 ) /tika/trunk/CHANGES.txt /tika/trunk/tika-parsers/pom.xml /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/asm/XHTMLClassVisitor.java
          Hide
          thetaphi Uwe Schindler added a comment -

          Sorry for a second patch. I just noticed that you were using "asm-debug-all.jar" instead of plain simple "asm.jar". As this is a very basic parser, the asm-commons parts or helper visitors are not needed, so we should fallback to plain asm (also for compatibility with other projects). The -debug stuff was previously used because of generics warnings in earlier versions (they stripped off generics from JAR file), but this is no longer an issue.

          So please apply this patch, too

          Show
          thetaphi Uwe Schindler added a comment - Sorry for a second patch. I just noticed that you were using "asm-debug-all.jar" instead of plain simple "asm.jar". As this is a very basic parser, the asm-commons parts or helper visitors are not needed, so we should fallback to plain asm (also for compatibility with other projects). The -debug stuff was previously used because of generics warnings in earlier versions (they stripped off generics from JAR file), but this is no longer an issue. So please apply this patch, too
          Hide
          thetaphi Uwe Schindler added a comment -

          Reopen for 2nd patch.

          Show
          thetaphi Uwe Schindler added a comment - Reopen for 2nd patch.
          Hide
          davemeikle Dave Meikle added a comment -

          Thanks Uwe Schindler. Have made the change and adjusted the tika-bundle as well.

          Nick Burch - that is a good shout re tests. Will add some today.

          Show
          davemeikle Dave Meikle added a comment - Thanks Uwe Schindler . Have made the change and adjusted the tika-bundle as well. Nick Burch - that is a good shout re tests. Will add some today.
          Hide
          thetaphi Uwe Schindler added a comment -

          The question about this: This will not fail tests when new versions of JVMs are out. You will only find that problem when new class files are added

          In my opinion, a good test would also be to also test a class file from the local JVM (e.g., String.class.getResourceAsStream('String.class')
          With that test you would actually make sure that the class files of the JVM that compiles can be read! So once Java 9 is out and has a new classfile format, this would fail build if somebody runs build with this JVM.

          Show
          thetaphi Uwe Schindler added a comment - The question about this: This will not fail tests when new versions of JVMs are out. You will only find that problem when new class files are added In my opinion, a good test would also be to also test a class file from the local JVM (e.g., String.class.getResourceAsStream('String.class') With that test you would actually make sure that the class files of the JVM that compiles can be read! So once Java 9 is out and has a new classfile format, this would fail build if somebody runs build with this JVM.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in tika-trunk-jdk1.7 #819 (See https://builds.apache.org/job/tika-trunk-jdk1.7/819/)
          TIKA-1705: Changed dependecy from asm-debug-all to asm (dmeikle: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1695235)

          • /tika/trunk/tika-bundle/pom.xml
          • /tika/trunk/tika-parsers/pom.xml
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #819 (See https://builds.apache.org/job/tika-trunk-jdk1.7/819/ ) TIKA-1705 : Changed dependecy from asm-debug-all to asm (dmeikle: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1695235 ) /tika/trunk/tika-bundle/pom.xml /tika/trunk/tika-parsers/pom.xml

            People

            • Assignee:
              davemeikle Dave Meikle
              Reporter:
              thetaphi Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development