Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6
    • Component/s: None
    • Labels:
      None

      Description

      I've ran this tool (does the job): http://code.google.com/p/versioncheck/ on the checkout of branch_3x. To my surprise there is a JAR which contains Java 1.6 code:

      Major.Minor Version : 50.0             JAVA compatibility : Java 1.6 platform: 45.3-50.0
      Number of classes : 60
      
      Classes are : 
      c:\Work\lucene-solr\.\solr\contrib\extraction\lib\netcdf-4.2-min.jar [:] ucar/unidata/geoloc/Bearing.class
      ...
      
      1. output.log
        3.67 MB
        Dawid Weiss
      2. SOLR-3295.patch
        2 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          simple patch

          Show
          Robert Muir added a comment - simple patch
          Hide
          Robert Muir added a comment -

          ok. the user did ask to index the gigabytes of binary data so I don't feel bad about that

          Show
          Robert Muir added a comment - ok. the user did ask to index the gigabytes of binary data so I don't feel bad about that
          Hide
          Uwe Schindler added a comment -

          Nuke the JAR. See the comments on the TIKA issue. If the JAR is missing, the Parser will get a ClassNotFoundException and TIKA's framework will disable it.

          Your test simply uses the next possible parser, which is the plain text parser; indexing gigabytes of horrible binary data casted to text in arbitrary autodetected encoding

          Show
          Uwe Schindler added a comment - Nuke the JAR. See the comments on the TIKA issue. If the JAR is missing, the Parser will get a ClassNotFoundException and TIKA's framework will disable it. Your test simply uses the next possible parser, which is the plain text parser; indexing gigabytes of horrible binary data casted to text in arbitrary autodetected encoding
          Hide
          Robert Muir added a comment -

          Is it possible for anyone to write a solr-level test? I tried to make one (using testrh.nc from http://www.unidata.ucar.edu/software/netcdf/examples/files.html)

          Index: solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java
          ===================================================================
          --- solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java	(revision 1308448)
          +++ solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java	(working copy)
          @@ -445,6 +445,17 @@
               catch(Exception expected){}
             }
             
          +  public void testCDF() throws Exception {
          +    ExtractingRequestHandler handler = (ExtractingRequestHandler) h.getCore().getRequestHandler("/update/extract");
          +    assertTrue("handler is null and it shouldn't be", handler != null);
          +    loadLocal("extraction/testrh.nc",
          +        "literal.id", "one",
          +        "fmap.content", "extractedContent"
          +    );
          +    assertU(commit());
          +    assertQ(req("*:*"), "//result[@numFound=1]");
          +  }
          +  
             SolrQueryResponse loadLocal(String filename, String... args) throws Exception {
               LocalSolrQueryRequest req = (LocalSolrQueryRequest) req(args);
               try {
          

          But on my java5 jdk, this test passes. I guess i dont know enough about whats going on with contrib/extraction
          here to figure this out. But i saw options like 'ignore exceptions from tika' and it seems like there are a lot
          of options.

          I'd like to get a 3.6 release candidate out soon...

          Show
          Robert Muir added a comment - Is it possible for anyone to write a solr-level test? I tried to make one (using testrh.nc from http://www.unidata.ucar.edu/software/netcdf/examples/files.html ) Index: solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java =================================================================== --- solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java (revision 1308448) +++ solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java (working copy) @@ -445,6 +445,17 @@ catch(Exception expected){} } + public void testCDF() throws Exception { + ExtractingRequestHandler handler = (ExtractingRequestHandler) h.getCore().getRequestHandler("/update/extract"); + assertTrue("handler is null and it shouldn't be", handler != null); + loadLocal("extraction/testrh.nc", + "literal.id", "one", + "fmap.content", "extractedContent" + ); + assertU(commit()); + assertQ(req("*:*"), "//result[@numFound=1]"); + } + SolrQueryResponse loadLocal(String filename, String... args) throws Exception { LocalSolrQueryRequest req = (LocalSolrQueryRequest) req(args); try { But on my java5 jdk, this test passes. I guess i dont know enough about whats going on with contrib/extraction here to figure this out. But i saw options like 'ignore exceptions from tika' and it seems like there are a lot of options. I'd like to get a 3.6 release candidate out soon...
          Hide
          Dawid Weiss added a comment -

          It's some obscure data format

          I meant no offense, just my take at how many people in the wild may be using it compared to how many download solr in general.

          Show
          Dawid Weiss added a comment - It's some obscure data format I meant no offense, just my take at how many people in the wild may be using it compared to how many download solr in general.
          Hide
          Uwe Schindler added a comment -

          IBM JRE should not affect TIKA, as TIKA has an own SPI parser (because it only requires Java 5 that has no SPI). Lucene 4 uses Java 6 SPI, which has the problem.

          Show
          Uwe Schindler added a comment - IBM JRE should not affect TIKA, as TIKA has an own SPI parser (because it only requires Java 5 that has no SPI). Lucene 4 uses Java 6 SPI, which has the problem.
          Hide
          Chris A. Mattmann added a comment -

          Yep, Uwe is right. I am going to try and reproduce the error in Tika-ville, and see if we can just make Tika swallow the ClassNotFoundException and move along (as it shouldn't affect Tika parsing for the rest of the valid parsers there). If so, we can push a fix for it in Tika and see if that addresses your guys' issue.

          Thanks for the note on IBM JRE, Robert. I didn't know that.

          Show
          Chris A. Mattmann added a comment - Yep, Uwe is right. I am going to try and reproduce the error in Tika-ville, and see if we can just make Tika swallow the ClassNotFoundException and move along (as it shouldn't affect Tika parsing for the rest of the valid parsers there). If so, we can push a fix for it in Tika and see if that addresses your guys' issue. Thanks for the note on IBM JRE, Robert. I didn't know that.
          Hide
          Uwe Schindler added a comment -

          TIKA has own SPI parser. See my comment on TIKA issue. Modifying the SPI file does not work, because the SPI parser will collect all parsers from all SPI files it found on classpath, so removing is impossible.

          Show
          Uwe Schindler added a comment - TIKA has own SPI parser. See my comment on TIKA issue. Modifying the SPI file does not work, because the SPI parser will collect all parsers from all SPI files it found on classpath, so removing is impossible.
          Hide
          Robert Muir added a comment -

          I think you can just provide your own version of that in SolrCell and then it will load yours over the one in the tika-parsers jar, and then in yours you remove those class definitions.

          Is that the ordinary java SPI mechanism? If so, then it depends upon classpath order right?
          (and, doesn't work on IBM JRE: see LUCENE-3927).

          Show
          Robert Muir added a comment - I think you can just provide your own version of that in SolrCell and then it will load yours over the one in the tika-parsers jar, and then in yours you remove those class definitions. Is that the ordinary java SPI mechanism? If so, then it depends upon classpath order right? (and, doesn't work on IBM JRE: see LUCENE-3927 ).
          Hide
          Chris A. Mattmann added a comment -

          Hey Robert:

          How can i simply do this?

          I was just discussing this with Uwe on TIKA-888 – I think you guys can provide your own custom version of the Tika service provider load file, and just omit the classes for Hdf and NetCDF parsers. I think then your users won't get to the code that depends on that jar file and you'll be good. See this file:

          http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser

          I think you can just provide your own version of that in SolrCell and then it will load yours over the one in the tika-parsers jar, and then in yours you remove those class definitions.

          Show
          Chris A. Mattmann added a comment - Hey Robert: How can i simply do this? I was just discussing this with Uwe on TIKA-888 – I think you guys can provide your own custom version of the Tika service provider load file, and just omit the classes for Hdf and NetCDF parsers. I think then your users won't get to the code that depends on that jar file and you'll be good. See this file: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser I think you can just provide your own version of that in SolrCell and then it will load yours over the one in the tika-parsers jar, and then in yours you remove those class definitions.
          Hide
          Robert Muir added a comment -

          but if you don't care about parsing those types of files, you can simply omit that parser and exclude the jar file dependency.

          How can i simply do this? Thats basically how i'd like to fix this for this (last java5) solr release.
          I would like to omit JUST this one by default, along with documentation saying: if you are willing to run java6,
          then do XYZ to enable it.

          I can't wait to put java5 support behind us...

          Show
          Robert Muir added a comment - but if you don't care about parsing those types of files, you can simply omit that parser and exclude the jar file dependency. How can i simply do this? Thats basically how i'd like to fix this for this (last java5) solr release. I would like to omit JUST this one by default, along with documentation saying: if you are willing to run java6, then do XYZ to enable it. I can't wait to put java5 support behind us...
          Hide
          Chris A. Mattmann added a comment -

          Hi Guys:

          Couple of comments.

          Thanks for doing the test. I know this already because I hit that, too. Its caused by TIKA's dependencies. The NetCDF (http://www.unidata.ucar.edu/software/netcdf/) parser is only compiled with Java 1.6, although TIKA is also only Java 1.5, so this is a TIKA bug.

          In Tika, I wouldn't classify this as a bug, since our parser jar dependencies can be excluded in various ways. It's simply a requirement for folks that are interested in all of the features that the NetCDF library provides, but if you don't care about parsing those types of files, you can simply omit that parser and exclude the jar file dependency.

          's obscure, indeed, especially for people outside the climate community.

          Obscure? Sorry, not meaning to argue here, but that's pretty patently untrue. All data formats are at some level obscure, depending on the community that you work in. The "climate" one that you are talking about includes a broad range of folks, dealing with remote sensing, climate modeling, decision making, etc., at some of the highest levels of government, funding, and other areas, both in the U.S. and internationally. NetCDF, and HDF, OPeNDAP, and other formats are pretty broadly accepted standards. The use of data from NetCDF for example, resulted in over 2000+ publications generated as part of the last Intergovernmental Panel on Climate Change (IPCC) and its 4th assessment report So, not sure it's obscure.

          he UCAR netcdf library is on the other hand not able to handle streaming file input, so TIKA loads the whole file into memory

          Yep, it's part of the issue of the underlying data file format more so than the actually library itself. It's because it doesn't support random access and yes the current code I had to bake into Tika unfortunately must work around it by loading the whole file into memory. Jukka and I have discussed some better support for this including temporary file support in Tika and we're working on improving it, but not there yet.

          don't really see the use-case for support in Solr

          It's up to you guys. If you want to tell users of Solr, "hey you can drop a scientific data file format onto Solr and magically its metadata will be indexed", then it might be important. We do this in OODT quite often, and it's one of the core use cases (and we even use Lucene and Solr for the metadata catalogs ).

          Loading a 500 Megabyte file into memory just to get the header

          A lot of times that header contains the key parameters (spatial and temporal bounds) that are required to make a decision as to what to do with the file, as well as other met fields including the remote sensing variables, or climate variables being measured, valid units, links to publications, etc. So it's more than useless information.

          Right, but how many people have these gigabyte climate data files

          Depends on who is using it. Like I said, this is pretty much all of the files that I deal with , but to each their own. Disabling it in Solr isn't really going to affect me (or others much) since OODT pretty much does this anyways, but meh.

          Show
          Chris A. Mattmann added a comment - Hi Guys: Couple of comments. Thanks for doing the test. I know this already because I hit that, too. Its caused by TIKA's dependencies. The NetCDF ( http://www.unidata.ucar.edu/software/netcdf/ ) parser is only compiled with Java 1.6, although TIKA is also only Java 1.5, so this is a TIKA bug. In Tika, I wouldn't classify this as a bug, since our parser jar dependencies can be excluded in various ways. It's simply a requirement for folks that are interested in all of the features that the NetCDF library provides, but if you don't care about parsing those types of files, you can simply omit that parser and exclude the jar file dependency. 's obscure, indeed, especially for people outside the climate community. Obscure? Sorry, not meaning to argue here, but that's pretty patently untrue. All data formats are at some level obscure, depending on the community that you work in. The "climate" one that you are talking about includes a broad range of folks, dealing with remote sensing, climate modeling, decision making, etc., at some of the highest levels of government, funding, and other areas, both in the U.S. and internationally. NetCDF, and HDF, OPeNDAP, and other formats are pretty broadly accepted standards. The use of data from NetCDF for example, resulted in over 2000+ publications generated as part of the last Intergovernmental Panel on Climate Change (IPCC) and its 4th assessment report So, not sure it's obscure. he UCAR netcdf library is on the other hand not able to handle streaming file input, so TIKA loads the whole file into memory Yep, it's part of the issue of the underlying data file format more so than the actually library itself. It's because it doesn't support random access and yes the current code I had to bake into Tika unfortunately must work around it by loading the whole file into memory. Jukka and I have discussed some better support for this including temporary file support in Tika and we're working on improving it, but not there yet. don't really see the use-case for support in Solr It's up to you guys. If you want to tell users of Solr, "hey you can drop a scientific data file format onto Solr and magically its metadata will be indexed", then it might be important. We do this in OODT quite often, and it's one of the core use cases (and we even use Lucene and Solr for the metadata catalogs ). Loading a 500 Megabyte file into memory just to get the header A lot of times that header contains the key parameters (spatial and temporal bounds) that are required to make a decision as to what to do with the file, as well as other met fields including the remote sensing variables, or climate variables being measured, valid units, links to publications, etc. So it's more than useless information. Right, but how many people have these gigabyte climate data files Depends on who is using it. Like I said, this is pretty much all of the files that I deal with , but to each their own. Disabling it in Solr isn't really going to affect me (or others much) since OODT pretty much does this anyways, but meh.
          Hide
          Jan Høydahl added a comment -

          The problem is, the tika-parser list is in META-INFO of tika-parsers.jar. To override it, you must disable the META-INF completely and list the parsers in program code when calling TIKAs ctors... I can upload our (unedited example code here).

          Yes, they use the ServiceProvider mechanism, so that each JAR providing Tika parser(s) have a META-INF/services/org.apache.tika.parser.Parser and it will register itself when on classpath. I agree that if the parsers were in many small JARs we could simply choose which ones to include and all would be fine. Then, Solr users could simply add a parser JAR to classpath to get support for a new format.

          Show
          Jan Høydahl added a comment - The problem is, the tika-parser list is in META-INFO of tika-parsers.jar. To override it, you must disable the META-INF completely and list the parsers in program code when calling TIKAs ctors... I can upload our (unedited example code here). Yes, they use the ServiceProvider mechanism, so that each JAR providing Tika parser(s) have a META-INF/services/org.apache.tika.parser.Parser and it will register itself when on classpath. I agree that if the parsers were in many small JARs we could simply choose which ones to include and all would be fine. Then, Solr users could simply add a parser JAR to classpath to get support for a new format.
          Hide
          Uwe Schindler added a comment -

          The problem is, the tika-parser list is in META-INFO of tika-parsers.jar. To override it, you must disable the META-INF completely and list the parsers in program code when calling TIKAs ctors... I can upload our (unedited example code here).

          Show
          Uwe Schindler added a comment - The problem is, the tika-parser list is in META-INFO of tika-parsers.jar. To override it, you must disable the META-INF completely and list the parsers in program code when calling TIKAs ctors... I can upload our (unedited example code here).
          Hide
          Uwe Schindler added a comment -

          This is a long standing issue in TIKA, so that they provide separate parser jars, each with own META-INF, splitting the parsers up in e.g. tika-office-parsers, tika-image-parsers, tika-horrible-huge-climate-data-parsers, tika-soundfile-parsers,... Then each of these JARS would have a limited dependency set.

          We should open issue (I will check) and suggest that. It annoys me since month. Every TIKA release gets more crazy formats nobody is interested ina nd you only get a huge tika-parsers.jar will approx 100 3rd-Party JARS you have no control anymore.

          Show
          Uwe Schindler added a comment - This is a long standing issue in TIKA, so that they provide separate parser jars, each with own META-INF, splitting the parsers up in e.g. tika-office-parsers, tika-image-parsers, tika-horrible-huge-climate-data-parsers, tika-soundfile-parsers,... Then each of these JARS would have a limited dependency set. We should open issue (I will check) and suggest that. It annoys me since month. Every TIKA release gets more crazy formats nobody is interested ina nd you only get a huge tika-parsers.jar will approx 100 3rd-Party JARS you have no control anymore.
          Hide
          Robert Muir added a comment -

          Can we somehow enable everything that was working before except the problematic java5 stuff?

          Show
          Robert Muir added a comment - Can we somehow enable everything that was working before except the problematic java5 stuff?
          Hide
          Uwe Schindler added a comment -

          I have a file here that i used locally that configures TIKA for all parsers we are (PANGAEA) interested in. In my opinion, its also useless to feed mp3 and several other non-text file types to Solr Extraction handler, it would only extract file metadata, but no contents. By limiting to only formats containing fulltext (not only metadata), this would make much more sense and would remove a bunch of other JAR files.

          Show
          Uwe Schindler added a comment - I have a file here that i used locally that configures TIKA for all parsers we are (PANGAEA) interested in. In my opinion, its also useless to feed mp3 and several other non-text file types to Solr Extraction handler, it would only extract file metadata, but no contents. By limiting to only formats containing fulltext (not only metadata), this would make much more sense and would remove a bunch of other JAR files.
          Hide
          Robert Muir added a comment -

          If we want to fix it correct, we have to add some configuration class file to contrib/extraction, listing all supported parsers, so Solr will not chosse one thats not there.

          Right, but how many people have these gigabyte climate data files? realistically jar-nuking sounds pretty good.

          if someone knows how to make this configuration file excluding this format, great, but i don't think i need
          to figure that out to svn rm this jar.

          Show
          Robert Muir added a comment - If we want to fix it correct, we have to add some configuration class file to contrib/extraction, listing all supported parsers, so Solr will not chosse one thats not there. Right, but how many people have these gigabyte climate data files? realistically jar-nuking sounds pretty good. if someone knows how to make this configuration file excluding this format, great, but i don't think i need to figure that out to svn rm this jar.
          Hide
          Uwe Schindler added a comment -

          If you do this, the problem does not get better. A user who feeds a nc file to Solr Extraction handler will get ClassNotFoundException, thats identical problem like an invalid class format (we have now). The only difference is, that it would work if he uses Java 6 (what most users do). The problem here is more consistency.

          If we want to fix it correct, we have to add some configuration class file to contrib/extraction, listing all supported parsers, so Solr will not chosse one thats not there.

          Show
          Uwe Schindler added a comment - If you do this, the problem does not get better. A user who feeds a nc file to Solr Extraction handler will get ClassNotFoundException, thats identical problem like an invalid class format (we have now). The only difference is, that it would work if he uses Java 6 (what most users do). The problem here is more consistency. If we want to fix it correct, we have to add some configuration class file to contrib/extraction, listing all supported parsers, so Solr will not chosse one thats not there.
          Hide
          Robert Muir added a comment -

          if all tests pass without this jar, why do we need it?

          If you run the TIKA tests with 1.5 it will crash.

          Let me rephrase.

          I'm going to 'svn delete' this jar file in 5 hours from 3.x, unless anyone can tell me otherwise, and mark this issue resolved.

          If you commit a 5MB jar there should be a test showing why its used.

          Show
          Robert Muir added a comment - if all tests pass without this jar, why do we need it? If you run the TIKA tests with 1.5 it will crash. Let me rephrase. I'm going to 'svn delete' this jar file in 5 hours from 3.x, unless anyone can tell me otherwise, and mark this issue resolved. If you commit a 5MB jar there should be a test showing why its used.
          Hide
          Robert Muir added a comment -

          Lets disable the parser here. I don't want to wait on tika to release.

          Show
          Robert Muir added a comment - Lets disable the parser here. I don't want to wait on tika to release.
          Hide
          Uwe Schindler added a comment -

          Exactly. This is why I disabled that parser for our indexing at PANGAEA (although we support that file format). Loading a 500 Megabyte file into memory just to get the header and extract 2 kilobytes (of useless textual metadata) isn't really useful - especially if the file is on tape archive.

          Show
          Uwe Schindler added a comment - Exactly. This is why I disabled that parser for our indexing at PANGAEA (although we support that file format). Loading a 500 Megabyte file into memory just to get the header and extract 2 kilobytes (of useless textual metadata) isn't really useful - especially if the file is on tape archive.
          Hide
          Dawid Weiss added a comment -

          Climate format data? Man... This just calls for a custom simplified parser that would read the header and forget the rest. And it'd be 5mb less to distribute...

          Show
          Dawid Weiss added a comment - Climate format data? Man... This just calls for a custom simplified parser that would read the header and forget the rest. And it'd be 5mb less to distribute...
          Hide
          Uwe Schindler added a comment -

          It's some obscure data format that tika can convert to plain text. I've never seen it, don't know what it is. Uwe filed a bug for Tika.

          LOL. It's obscure, indeed, especially for people outside the climate community. As representative of PANGAEA, I of course know this format (its something like a container for huge multi-dimensional arrays) of numeric data. The funny thing is: The only textual parts are metadata about the file in the header, the data itsself never contains any text. The UCAR netcdf library is on the other hand not able to handle streaming file input, so TIKA loads the whole file into memory... and OOMs in most cases for files I have seen (climate users produce files up to several gigabytes, if not terabytes, depends on use-case, see examples here: Kleinen, T et al. (2011): Holocene carbon cycle dynamics, links to model files. doi:10.1594/PANGAEA.758219). So I don't really see the use-case for support in Solr. But if we remove the JAR file, people who try to index .nc files, will get ClassNotFoundException. So ideally, we should also remove the file format from TIKA's META-INF in that case, or instruct the loader to ignore that. I always use some "TIKA loader" component in my programs, that configure TIKA to only provide the formats I am interested in and I remove all useless JAR files then (but thats not easy). I can provide code how to configure TIKA for a subset of formats, which makes it easier for us to control what libraries are needed and users won't get ClassNotFound/InvalidClassFileFormat errors.

          Show
          Uwe Schindler added a comment - It's some obscure data format that tika can convert to plain text. I've never seen it, don't know what it is. Uwe filed a bug for Tika. LOL. It's obscure, indeed, especially for people outside the climate community. As representative of PANGAEA, I of course know this format (its something like a container for huge multi-dimensional arrays) of numeric data. The funny thing is: The only textual parts are metadata about the file in the header, the data itsself never contains any text. The UCAR netcdf library is on the other hand not able to handle streaming file input, so TIKA loads the whole file into memory... and OOMs in most cases for files I have seen (climate users produce files up to several gigabytes, if not terabytes, depends on use-case, see examples here: Kleinen, T et al. (2011): Holocene carbon cycle dynamics, links to model files. doi:10.1594/PANGAEA.758219 ). So I don't really see the use-case for support in Solr. But if we remove the JAR file, people who try to index .nc files, will get ClassNotFoundException. So ideally, we should also remove the file format from TIKA's META-INF in that case, or instruct the loader to ignore that. I always use some "TIKA loader" component in my programs, that configure TIKA to only provide the formats I am interested in and I remove all useless JAR files then (but thats not easy). I can provide code how to configure TIKA for a subset of formats, which makes it easier for us to control what libraries are needed and users won't get ClassNotFound/InvalidClassFileFormat errors.
          Hide
          Dawid Weiss added a comment -

          if all tests pass without this jar, why do we need it?

          It's some obscure data format that tika can convert to plain text. I've never seen it, don't know what it is. Uwe filed a bug for Tika.

          Show
          Dawid Weiss added a comment - if all tests pass without this jar, why do we need it? It's some obscure data format that tika can convert to plain text. I've never seen it, don't know what it is. Uwe filed a bug for Tika.
          Hide
          Uwe Schindler added a comment -

          I opened TIKA-888.

          Show
          Uwe Schindler added a comment - I opened TIKA-888 .
          Hide
          Uwe Schindler added a comment -

          if all tests pass without this jar, why do we need it?

          If you run the TIKA tests with 1.5 it will crash.

          Show
          Uwe Schindler added a comment - if all tests pass without this jar, why do we need it? If you run the TIKA tests with 1.5 it will crash.
          Hide
          Uwe Schindler added a comment - - edited

          Solr is at java 1.6 in 3.x right?

          Solr is at 1.5 in 3.x. That's "tested" compilation wise every 30 minutes with Jenkins, but this error cannot be found by this.

          Thanks for doing the test. I know this already because I hit that, too. Its caused by TIKA's dependencies. The NetCDF (http://www.unidata.ucar.edu/software/netcdf/) parser is only compiled with Java 1.6, although TIKA is also only Java 1.5, so this is a TIKA bug.

          But on the other hand, you would only hit the bug, if you try to extract NetCDF-metadata and have only Java 5. The bug is not seen, because we have no test for this in the contrib/extraction tests. The compiler cannot find that bug, as he will never reach out to that class (because they are only indirectly used by TIKA).

          We should open an issue at TIKA.

          Show
          Uwe Schindler added a comment - - edited Solr is at java 1.6 in 3.x right? Solr is at 1.5 in 3.x. That's "tested" compilation wise every 30 minutes with Jenkins, but this error cannot be found by this. Thanks for doing the test. I know this already because I hit that, too. Its caused by TIKA's dependencies. The NetCDF ( http://www.unidata.ucar.edu/software/netcdf/ ) parser is only compiled with Java 1.6, although TIKA is also only Java 1.5, so this is a TIKA bug. But on the other hand, you would only hit the bug, if you try to extract NetCDF-metadata and have only Java 5. The bug is not seen, because we have no test for this in the contrib/extraction tests. The compiler cannot find that bug, as he will never reach out to that class (because they are only indirectly used by TIKA). We should open an issue at TIKA.
          Hide
          Robert Muir added a comment -

          if all tests pass without this jar, why do we need it?

          Show
          Robert Muir added a comment - if all tests pass without this jar, why do we need it?
          Hide
          Dawid Weiss added a comment -

          Robert says this isn't the case. Also: this is THE only jar that requires 1.6 so I'd say it's probably a mistake?

          Show
          Dawid Weiss added a comment - Robert says this isn't the case. Also: this is THE only jar that requires 1.6 so I'd say it's probably a mistake?
          Hide
          Ryan McKinley added a comment -

          Solr is at java 1.6 in 3.x right?

          so I think we only need to worry about the stuff in lucene

          Show
          Ryan McKinley added a comment - Solr is at java 1.6 in 3.x right? so I think we only need to worry about the stuff in lucene
          Hide
          Dawid Weiss added a comment -

          Full output log from version checker.

          Show
          Dawid Weiss added a comment - Full output log from version checker.

            People

            • Assignee:
              Robert Muir
              Reporter:
              Dawid Weiss
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development