Solr
  1. Solr
  2. SOLR-5532

SolrJ Content-Type validation is too strict for some webcontainers / proxies, breaks on equivilent content types

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.6
    • Fix Version/s: 4.6.1, 4.7, 6.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Windows 7, Java 1.7.0_45 (64bit), solr-solrj-4.6.0.jar

      Description

      due to SOLR-3530, HttpSolrServer now does a string equivilence check between the "Content-Type" returned by the server, and a getContentTYpe() method declared by the ResponseParser .. but string equivilence is too strict, and can result in errors like this one reported by a user....


      I just upgraded my Solr instance and with it I also upgraded the solrj library in our custom application which sends diverse requests and queries to Solr.

      I use the "ping" method to determine whether Solr started correctly under the configured address. Since the upgrade the ping response results in an error:

      Cause: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected content type application/xml; charset=UTF-8 but got application/xml;charset=UTF-8.
      <?xml version="1.0" encoding="UTF-8"?>
      <response>
      <lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int><lst name="params"><str name="df">searchtext</str><str name="echoParams">all</str><str name="rows">10</str><str name="echoParams">all</str><str name="wt">xml</str><str name="version">2.2</str><str name="q">solrpingquery</str><str name="distrib">false</str></lst></lst><str name="status">OK</str>
      </response>
      

      The Solr application itself works fine.
      Using an older version of the solrj library than solr-solrj-4.6.0.jar (e.g. solr-solrj-4.5.1.jar) in the custom application does not produce this error.

      The Exception is produced in a Code block (HttpSolrServer.java, method request(...), around. line 140) which has been introduced with version 4.6.0.

      Code to reproduce the error:

      try {
      	HttpSolrServer solrServer = new HttpSolrServer("http://localhost:8080/Solr/collection");
      	solrServer.setParser(new XMLResponseParser()); // this line is making all the difference
      	solrServer.ping();
      } catch (Exception e) {
      	e.printStackTrace();
      }
      

      A global search for "charset=UTF-8" on the source code of solrj indicates that other functions besides "ping" might be affected as well, because there are several places where "application/xml; charset=UTF-8" is spelled without a space after the semicolon.

      1. SOLR-5532.patch
        2 kB
        Mark Miller
      2. SOLR-5532-elyograg-eclipse-screenshot.png
        185 kB
        Shawn Heisey

        Issue Links

          Activity

          Hide
          Shawn Heisey added a comment -

          My server is not upgraded to 4.6 - it's running 4.2.1 ... with that server, your test code works fine. I am in the process of setting up a test server running 4.6, but I don't have it yet. Is your server also running 4.6?

          If your server is 4.6, this looks like the Solr server code has gotten more strict on the content type header, but SolrJ hasn't been updated to match, and the existing unit tests didn't catch the problem.

          Would you be able to try the following addition to your test code, right after you create the HttpSolrServer?

          solrServer.setRequestWriter(new BinaryRequestWriter());
          

          If that works, it's a potential workaround for you, with the added advantage that requests sent to Solr will be more compact and therefore more efficient.

          Also, just in case I have any trouble reproducing, can you include the entire Java stacktrace from your Solr server log when the problem occurs?

          Show
          Shawn Heisey added a comment - My server is not upgraded to 4.6 - it's running 4.2.1 ... with that server, your test code works fine. I am in the process of setting up a test server running 4.6, but I don't have it yet. Is your server also running 4.6? If your server is 4.6, this looks like the Solr server code has gotten more strict on the content type header, but SolrJ hasn't been updated to match, and the existing unit tests didn't catch the problem. Would you be able to try the following addition to your test code, right after you create the HttpSolrServer? solrServer.setRequestWriter( new BinaryRequestWriter()); If that works, it's a potential workaround for you, with the added advantage that requests sent to Solr will be more compact and therefore more efficient. Also, just in case I have any trouble reproducing, can you include the entire Java stacktrace from your Solr server log when the problem occurs?
          Hide
          Shawn Heisey added a comment -

          Got the example 4.6 server set up. With the following code, substantially similar to yours, I do NOT get the same problem.

          		try
          		{
          			HttpSolrServer solrServer = new HttpSolrServer(
          					"http://localhost:8983/solr/collection1");
          			SolrPingResponse x = solrServer.ping();
          			System.out.println(x);
          		}
          		catch (Exception e)
          		{
          			e.printStackTrace();
          		}
          

          If I change "collection1" to "collection" (so the URL is invalid) then I do get a similar (but not exactly the same) message, but it's part of a much larger error. Can you help me figure out what I might be doing that's different from your setup?

          org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected content type application/octet-stream but got text/html;charset=ISO-8859-1. <html>
          <head>
          <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
          <title>Error 404 Not Found</title>
          </head>
          <body><h2>HTTP ERROR 404</h2>
          <p>Problem accessing /solr/collection/admin/ping. Reason:
          <pre>    Not Found</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          <br/>                                                
          
          </body>
          </html>
          
          	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:455)
          	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
          	at org.apache.solr.client.solrj.request.SolrPing.process(SolrPing.java:69)
          	at org.apache.solr.client.solrj.SolrServer.ping(SolrServer.java:293)
          	at org.elyograg.TestStuff.main(TestStuff.java:72)
          
          Show
          Shawn Heisey added a comment - Got the example 4.6 server set up. With the following code, substantially similar to yours, I do NOT get the same problem. try { HttpSolrServer solrServer = new HttpSolrServer( "http: //localhost:8983/solr/collection1" ); SolrPingResponse x = solrServer.ping(); System .out.println(x); } catch (Exception e) { e.printStackTrace(); } If I change "collection1" to "collection" (so the URL is invalid) then I do get a similar (but not exactly the same) message, but it's part of a much larger error. Can you help me figure out what I might be doing that's different from your setup? org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected content type application/octet-stream but got text/html;charset=ISO-8859-1. <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Error 404 Not Found</title> </head> <body><h2>HTTP ERROR 404</h2> <p>Problem accessing /solr/collection/admin/ping. Reason: <pre> Not Found</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> </body> </html> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:455) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) at org.apache.solr.client.solrj.request.SolrPing.process(SolrPing.java:69) at org.apache.solr.client.solrj.SolrServer.ping(SolrServer.java:293) at org.elyograg.TestStuff.main(TestStuff.java:72)
          Hide
          Hoss Man added a comment -

          The specific error message being returned here by HttpSolrServer is from lines 438-456 (NOT 140 as initially mentioned in the bug report) ...

                String procCt = processor.getContentType();
                if (procCt != null) {
                  if (!contentType.equals(procCt)) {
                    // unexpected content type
                    String msg = "Expected content type " + procCt + " but got " + contentType + ".";
                    Header encodingHeader = response.getEntity().getContentEncoding();
                    String encoding;
                    if (encodingHeader != null) {
                      encoding = encodingHeader.getValue();
                    } else {
                      encoding = "UTF-8"; // try UTF-8
                    }
                    try {
                      msg = msg + " " + IOUtils.toString(respBody, encoding);
                    } catch (IOException e) {
                      throw new RemoteSolrException(httpStatus, "Could not parse response with encoding " + encoding, e);
                    }
                    RemoteSolrException e = new RemoteSolrException(httpStatus, msg, null);
                    throw e;
                  }
                }
          

          ...the intent of this code is to ensure that the Content-Type of the response it has received from the server is something that the ResponseParser it's configured to use is capable of handling.

          The code as written definitely seems sketchy, because it's trying to do an exact string equality match on the Content-Type, even though (as in this example)...
          application/xml; charset=UTF-8 (with space)
          ...should be considered equivalent to...
          application/xml;charset=UTF-8 (no space)
          ... nothing in the spec requries whitespcae there, or makes them semantically un-equivilent.

          having said that however: i'm still not clear on how to actually reproduce this.

          As far as i can tell from a quick check, the XML response writer (on the server) has always included a space in it's Content-Type, and that doesn't seem to have changed in 4.6, and the solrj expected XML Content-Type (on the client parser) has always expected a space in the Content-Type, and that doesn't seem to have changed in 4.6.

          Can you pleas show us..

          A global search for "charset=UTF-8" on the source code of solrj indicates that other functions besides "ping" might be affected as well, because there are several places where "application/xml; charset=UTF-8" is spelled without a space after the semicolon.

          Can you please be specific? I just did the same search of the 4.6 codebase and i can't find anything like what you are describing.

          Show
          Hoss Man added a comment - The specific error message being returned here by HttpSolrServer is from lines 438-456 ( NOT 140 as initially mentioned in the bug report) ... String procCt = processor.getContentType(); if (procCt != null ) { if (!contentType.equals(procCt)) { // unexpected content type String msg = "Expected content type " + procCt + " but got " + contentType + "." ; Header encodingHeader = response.getEntity().getContentEncoding(); String encoding; if (encodingHeader != null ) { encoding = encodingHeader.getValue(); } else { encoding = "UTF-8" ; // try UTF-8 } try { msg = msg + " " + IOUtils.toString(respBody, encoding); } catch (IOException e) { throw new RemoteSolrException(httpStatus, "Could not parse response with encoding " + encoding, e); } RemoteSolrException e = new RemoteSolrException(httpStatus, msg, null ); throw e; } } ...the intent of this code is to ensure that the Content-Type of the response it has received from the server is something that the ResponseParser it's configured to use is capable of handling. The code as written definitely seems sketchy, because it's trying to do an exact string equality match on the Content-Type, even though (as in this example)... application/xml; charset=UTF-8 (with space) ...should be considered equivalent to... application/xml;charset=UTF-8 (no space) ... nothing in the spec requries whitespcae there, or makes them semantically un-equivilent. having said that however: i'm still not clear on how to actually reproduce this. As far as i can tell from a quick check, the XML response writer (on the server) has always included a space in it's Content-Type, and that doesn't seem to have changed in 4.6, and the solrj expected XML Content-Type (on the client parser) has always expected a space in the Content-Type, and that doesn't seem to have changed in 4.6. Can you pleas show us.. the full source code of a test client that demonstrates this problem for you? the full stack trace produced by your test code the command line used to run your test case (with full classpath showing jar versions used) the response you get from "curl -v http://localhost:8983/solr/admin/system " the response you get from "curl -v http://localhost:8983/solr/collection1/admin/ping " A global search for "charset=UTF-8" on the source code of solrj indicates that other functions besides "ping" might be affected as well, because there are several places where "application/xml; charset=UTF-8" is spelled without a space after the semicolon. Can you please be specific? I just did the same search of the 4.6 codebase and i can't find anything like what you are describing.
          Hide
          Uwe Schindler added a comment -

          Chris Hostetter (Unused): In any case we should fix the parser to accept the content type:

          • case insensitive (MIME types are defined to be case insensitive= -> this is partly a bug in Solr already, I noticed this in a code review
          • strip of the charset from the MIME type before comparing. ContentStreamBase has methods for this (to extract the charset from a ContentType).
          Show
          Uwe Schindler added a comment - Chris Hostetter (Unused) : In any case we should fix the parser to accept the content type: case insensitive (MIME types are defined to be case insensitive= -> this is partly a bug in Solr already, I noticed this in a code review strip of the charset from the MIME type before comparing. ContentStreamBase has methods for this (to extract the charset from a ContentType).
          Hide
          Hoss Man added a comment -

          In any case we should fix the parser to accept the content type: ...

          Agreed – what you and i are talking about now is definitely a bug, I'm just not convinced i understand how Jakob ran into the specific error he's reportring – which scares me and makes me worried that there is some subtle second bug i'm not seeing.

          FWIW: My personal preference would be to refactor the APIs and abstract the content-type checking out HttpSolrServer, so that instead of calling ResponseParser.getContentType() and doing anything with the value, it calls a new ResponseParser.canParseContentType(String) ... put the logic you are taking about into the XMLResponseParser.canParseContentType, and add some basic backcompat support into ResponseParser.canParseContentType that does a simple case/whitespace insensitive comparison.

          Show
          Hoss Man added a comment - In any case we should fix the parser to accept the content type: ... Agreed – what you and i are talking about now is definitely a bug, I'm just not convinced i understand how Jakob ran into the specific error he's reportring – which scares me and makes me worried that there is some subtle second bug i'm not seeing. FWIW: My personal preference would be to refactor the APIs and abstract the content-type checking out HttpSolrServer, so that instead of calling ResponseParser.getContentType() and doing anything with the value, it calls a new ResponseParser.canParseContentType(String) ... put the logic you are taking about into the XMLResponseParser.canParseContentType, and add some basic backcompat support into ResponseParser.canParseContentType that does a simple case/whitespace insensitive comparison.
          Hide
          Jakob Furrer added a comment -

          Thanks for looking into this issue so fast.
          And my apologies for being very sloppy in the bug description.
          Due to time constraints, I did not test the code provided above in a standalone setting.
          I am aware that this is a big no-no. Mea culpa.

          I only now figured out that setting the XMLResponseParser triggers the Exception

          solrServer.setParser(new XMLResponseParser());
          

          Setting the BinaryRequestWriter or not does not influence the result for me.

          Side-Question:
          The XMLResponseParser is set in my custom application since very long. I cannot recall when or why set that specific parser.
          Could anyone give my some information whether, when or why this parser should (not) be set?

          As for

          A global search for "charset=UTF-8" on the source code of solrj indicates that other functions besides "ping" might be affected as well, because there are several places where "application/xml; charset=UTF-8" is spelled without a space after the semicolon.

          I am not familiar with the Solr sourcecode at all. I saw these search results, where "; charset=utf-8" appears 62 times and ";charset=utf-8" appears 12 times.
          From that I conclude that there is an inconsistency how to spell this encoding type within this library.

          Suchen nach: charset=UTF-8
          lucene\common-build.xml(2241): html.append('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n')
          lucene\analysis\common\src\java\org\apache\lucene\analysis\cjk\package.html(20): <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
          lucene\analysis\common\src\java\org\apache\lucene\analysis\cn\package.html(20): <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
          lucene\analysis\common\src\test\org\apache\lucene\analysis\charfilter\htmlStripReaderTest.html(4): <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
          lucene\analysis\common\src\test\org\apache\lucene\analysis\core\LuceneResourcesWikiPage.html(4): <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
          lucene\analysis\icu\src\java\overview.html(19): <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
          lucene\analysis\morfologik\src\java\overview.html(20): <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
          lucene\analysis\morfologik\src\java\org\apache\lucene\analysis\morfologik\package.html(20): <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
          lucene\analysis\smartcn\src\java\org\apache\lucene\analysis\cn\smart\package.html(20): <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
          lucene\analysis\smartcn\src\java\org\apache\lucene\analysis\cn\smart\hhmm\package.html(19): <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
          lucene\analysis\stempel\src\java\overview.html(20): <meta content="text/html; charset=UTF-8" http-equiv="content-type">
          lucene\benchmark\src\test\org\apache\lucene\benchmark\byTask\feeds\TestHtmlParser.java(76): "<meta http-equiv=\"Content-Type\" content=\"text/html;charset=UTF-8\" />" +
          lucene\benchmark\src\test\org\apache\lucene\benchmark\byTask\feeds\TestHtmlParser.java(84): assertEquals("text/html;charset=UTF-8", tags.get("content-type"));
          lucene\demo\src\java\overview.html(19): <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
          lucene\queryparser\docs\xml\LuceneContribQuery.dtd.entities.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' />
          lucene\queryparser\docs\xml\LuceneContribQuery.dtd.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' />
          lucene\queryparser\docs\xml\LuceneContribQuery.dtd.org.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' />
          lucene\queryparser\docs\xml\LuceneCoreQuery.dtd.entities.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' />
          lucene\queryparser\docs\xml\LuceneCoreQuery.dtd.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' />
          lucene\queryparser\docs\xml\LuceneCoreQuery.dtd.org.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' />
          lucene\site\changes\changes2html.pl(258): <META http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
          solr\CHANGES.txt(7359): have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8' 
          solr\CHANGES.txt(7451): content.  Using the contentType: "text/xml; charset=utf-8" will force
          solr\contrib\clustering\src\test-files\clustering\solr\collection1\conf\solrconfig.xml(421): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\contrib\dataimporthandler\src\test-files\dih\solr\collection1\conf\contentstream-solrconfig.xml(296): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\contrib\dataimporthandler\src\test-files\dih\solr\collection1\conf\dataimport-nodatasource-solrconfig.xml(294): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\contrib\dataimporthandler\src\test-files\dih\solr\collection1\conf\dataimport-solrconfig.xml(295): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\contrib\dataimporthandler-extras\src\test-files\dihextras\solr\collection1\conf\dataimport-solrconfig.xml(292): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\contrib\uima\src\test-files\uima\uima-tokenizers-solrconfig.xml(639): now requires: -H 'Content-type:text/xml; charset=utf-8' The response
          solr\contrib\uima\src\test-files\uima\solr\collection1\conf\solrconfig.xml(640): now requires: -H 'Content-type:text/xml; charset=utf-8' The response
          solr\contrib\velocity\src\java\org\apache\solr\response\VelocityResponseWriter.java(174): return request.getParams().get("v.contentType", "text/html;charset=UTF-8");
          solr\core\src\java\org\apache\solr\response\JSONResponseWriter.java(44): static String CONTENT_TYPE_JSON_UTF8 = "application/json; charset=UTF-8";
          solr\core\src\java\org\apache\solr\response\PHPResponseWriter.java(27): static String CONTENT_TYPE_PHP_UTF8="text/x-php;charset=UTF-8";
          solr\core\src\java\org\apache\solr\response\PHPSerializedResponseWriter.java(42): static String CONTENT_TYPE_PHP_UTF8="text/x-php-serialized;charset=UTF-8";
          solr\core\src\java\org\apache\solr\response\QueryResponseWriter.java(46): public static String CONTENT_TYPE_XML_UTF8="application/xml; charset=UTF-8";
          solr\core\src\java\org\apache\solr\response\QueryResponseWriter.java(47): public static String CONTENT_TYPE_TEXT_UTF8="text/plain; charset=UTF-8";
          solr\core\src\java\org\apache\solr\response\RubyResponseWriter.java(26): static String CONTENT_TYPE_RUBY_UTF8="text/x-ruby;charset=UTF-8";
          solr\core\src\java\org\apache\solr\servlet\SolrRequestParsers.java(638): if( idx > 0 ) { // remove the charset definition "; charset=utf-8"
          solr\core\src\test\org\apache\solr\analysis\htmlStripReaderTest.html(4): <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
          solr\core\src\test\org\apache\solr\servlet\SolrRequestParserTest.java(216): "application/x-www-form-urlencoded; charset=utf-8",
          solr\core\src\test-files\solr\collection1\conf\solrconfig-implicitproperties.xml(77): <str name="content-type">text/plain; charset=UTF-8</str>
          solr\docs\changes\Changes.html(31): <META http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
          solr\docs\changes\Changes.html(7435): have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8'
          solr\docs\changes\Changes.html(7527): content.  Using the contentType: "text/xml; charset=utf-8" will force
          solr\example\example-DIH\solr\db\conf\solrconfig.xml(407): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\example\example-DIH\solr\db\conf\xslt\example.xsl(27): <xsl:output media-type="text/html; charset=UTF-8" encoding="UTF-8"/> 
          solr\example\example-DIH\solr\db\conf\xslt\luke.xsl(38): <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8"/>
          solr\example\example-DIH\solr\mail\conf\solrconfig.xml(565): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\example\example-DIH\solr\rss\conf\solrconfig.xml(406): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\example\example-DIH\solr\solr\conf\solrconfig.xml(405): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\example\example-DIH\solr\tika\conf\solrconfig.xml(326): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\example\example-schemaless\solr\collection1\conf\solrconfig.xml(999): requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\example\example-schemaless\solr\collection1\conf\solrconfig.xml(1710): <str name="content-type">text/plain; charset=UTF-8</str>
          solr\example\exampledocs\test_utf8.sh(41): curl $URL/select --data-binary 'q=h%C3%A9llo&echoParams=explicit&wt=python' -H 'Content-type:application/x-www-form-urlencoded; charset=UTF-8' 2> /dev/null | grep 'h\\u00e9llo' > /dev/null 2>&1
          solr\example\exampledocs\test_utf8.sh(71): curl $URL/select --data-binary "q=$URL_UTF8&echoParams=explicit&wt=python"  -H 'Content-type:application/x-www-form-urlencoded; charset=UTF-8' 2> /dev/null | grep $EXPECTED > /dev/null 2>&1
          solr\example\solr\collection1\conf\solrconfig.xml(1006): requires: -H 'Content-type:text/xml; charset=utf-8'
          solr\example\solr\collection1\conf\solrconfig.xml(1751): <str name="content-type">text/plain; charset=UTF-8</str>
          solr\example\solr\collection1\conf\velocity\head.vm(7): <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
          solr\scripts\abc(95): rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d "<commit/>"`
          solr\scripts\abo(94): rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d "<optimize/>"`
          solr\scripts\commit(87): rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d "<commit/>"`
          solr\scripts\optimize(87): rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d "<optimize/>"`
          solr\site\html\tutorial.html(20): <META http-equiv="Content-Type" content="text/html; charset=UTF-8" />
          solr\solrj\src\java\org\apache\solr\client\solrj\impl\HttpSolrServer.java(273): "application/x-www-form-urlencoded; charset=UTF-8");
          solr\solrj\src\java\org\apache\solr\client\solrj\impl\XMLResponseParser.java(50): public static final String XML_CONTENT_TYPE = "application/xml; charset=UTF-8";
          solr\solrj\src\java\org\apache\solr\client\solrj\util\ClientUtils.java(54): public static final String TEXT_XML = "application/xml; charset=UTF-8";  
          solr\solrj\src\test\org\apache\solr\client\solrj\impl\BasicHttpSolrServerTest.java(253): assertEquals("application/x-www-form-urlencoded; charset=UTF-8", DebugServlet.headers.get("Content-Type"));
          solr\solrj\src\test\org\apache\solr\client\solrj\impl\BasicHttpSolrServerTest.java(289): assertEquals("application/x-www-form-urlencoded; charset=UTF-8", DebugServlet.headers.get("Content-Type"));
          solr\solrj\src\test\org\apache\solr\client\solrj\impl\BasicHttpSolrServerTest.java(358): assertEquals("application/xml; charset=UTF-8", DebugServlet.headers.get("Content-Type"));
          solr\solrj\src\test\org\apache\solr\client\solrj\impl\BasicHttpSolrServerTest.java(374): assertEquals("application/xml; charset=UTF-8", DebugServlet.headers.get("Content-Type"));
          solr\test-framework\src\java\org\apache\solr\util\RestTestHarness.java(149): connection.setRequestProperty("Content-Type", "application/json; charset=utf-8");
          solr\webapp\web\js\scripts\dashboard.js(57): core_basepath + '/admin/file/?file=admin-extra.menu-top.html&contentType=text/html;charset=utf-8',
          solr\webapp\web\js\scripts\dashboard.js(67): core_basepath + '/admin/file/?file=admin-extra.menu-bottom.html&contentType=text/html;charset=utf-8',
          solr\webapp\web\js\scripts\file.js(29): + '&contentType=text/xml;charset=utf-8';
          Es wurden 74 Vorkommen in 62 Datei(en) gefunden.
          
          Show
          Jakob Furrer added a comment - Thanks for looking into this issue so fast. And my apologies for being very sloppy in the bug description. Due to time constraints, I did not test the code provided above in a standalone setting. I am aware that this is a big no-no. Mea culpa. I only now figured out that setting the XMLResponseParser triggers the Exception solrServer.setParser( new XMLResponseParser()); Setting the BinaryRequestWriter or not does not influence the result for me. Side-Question: The XMLResponseParser is set in my custom application since very long. I cannot recall when or why set that specific parser. Could anyone give my some information whether, when or why this parser should (not) be set? As for A global search for "charset=UTF-8" on the source code of solrj indicates that other functions besides "ping" might be affected as well, because there are several places where "application/xml; charset=UTF-8" is spelled without a space after the semicolon. I am not familiar with the Solr sourcecode at all. I saw these search results, where "; charset=utf-8" appears 62 times and ";charset=utf-8" appears 12 times. From that I conclude that there is an inconsistency how to spell this encoding type within this library. Suchen nach: charset=UTF-8 lucene\common-build.xml(2241): html.append('<meta http-equiv= "Content-Type" content= "text/html; charset=UTF-8" >\n') lucene\analysis\common\src\java\org\apache\lucene\analysis\cjk\ package .html(20): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" > lucene\analysis\common\src\java\org\apache\lucene\analysis\cn\ package .html(20): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" > lucene\analysis\common\src\test\org\apache\lucene\analysis\charfilter\htmlStripReaderTest.html(4): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" > lucene\analysis\common\src\test\org\apache\lucene\analysis\core\LuceneResourcesWikiPage.html(4): <meta http-equiv= "Content-Type" content= "text/html;charset=utf-8" > lucene\analysis\icu\src\java\overview.html(19): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" > lucene\analysis\morfologik\src\java\overview.html(20): <meta http-equiv= "Content-Type" content= "text/html;charset=UTF-8" /> lucene\analysis\morfologik\src\java\org\apache\lucene\analysis\morfologik\ package .html(20): <meta http-equiv= "Content-Type" content= "text/html;charset=UTF-8" /> lucene\analysis\smartcn\src\java\org\apache\lucene\analysis\cn\smart\ package .html(20): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" > lucene\analysis\smartcn\src\java\org\apache\lucene\analysis\cn\smart\hhmm\ package .html(19): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" > lucene\analysis\stempel\src\java\overview.html(20): <meta content= "text/html; charset=UTF-8" http-equiv= "content-type" > lucene\benchmark\src\test\org\apache\lucene\benchmark\byTask\feeds\TestHtmlParser.java(76): "<meta http-equiv=\" Content-Type\ " content=\" text/html;charset=UTF-8\ " />" + lucene\benchmark\src\test\org\apache\lucene\benchmark\byTask\feeds\TestHtmlParser.java(84): assertEquals( "text/html;charset=UTF-8" , tags.get( "content-type" )); lucene\demo\src\java\overview.html(19): <meta http-equiv= "Content-Type" content= "text/html; charset=utf-8" > lucene\queryparser\docs\xml\LuceneContribQuery.dtd.entities.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' /> lucene\queryparser\docs\xml\LuceneContribQuery.dtd.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' /> lucene\queryparser\docs\xml\LuceneContribQuery.dtd.org.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' /> lucene\queryparser\docs\xml\LuceneCoreQuery.dtd.entities.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' /> lucene\queryparser\docs\xml\LuceneCoreQuery.dtd.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' /> lucene\queryparser\docs\xml\LuceneCoreQuery.dtd.org.html(3): <meta http-equiv='CONTENT-TYPE' content='text/html; charset=UTF-8' /> lucene\site\changes\changes2html.pl(258): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" /> solr\CHANGES.txt(7359): have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8' solr\CHANGES.txt(7451): content. Using the contentType: "text/xml; charset=utf-8" will force solr\contrib\clustering\src\test-files\clustering\solr\collection1\conf\solrconfig.xml(421): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\contrib\dataimporthandler\src\test-files\dih\solr\collection1\conf\contentstream-solrconfig.xml(296): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\contrib\dataimporthandler\src\test-files\dih\solr\collection1\conf\dataimport-nodatasource-solrconfig.xml(294): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\contrib\dataimporthandler\src\test-files\dih\solr\collection1\conf\dataimport-solrconfig.xml(295): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\contrib\dataimporthandler-extras\src\test-files\dihextras\solr\collection1\conf\dataimport-solrconfig.xml(292): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\contrib\uima\src\test-files\uima\uima-tokenizers-solrconfig.xml(639): now requires: -H 'Content-type:text/xml; charset=utf-8' The response solr\contrib\uima\src\test-files\uima\solr\collection1\conf\solrconfig.xml(640): now requires: -H 'Content-type:text/xml; charset=utf-8' The response solr\contrib\velocity\src\java\org\apache\solr\response\VelocityResponseWriter.java(174): return request.getParams().get( "v.contentType" , "text/html;charset=UTF-8" ); solr\core\src\java\org\apache\solr\response\JSONResponseWriter.java(44): static String CONTENT_TYPE_JSON_UTF8 = "application/json; charset=UTF-8" ; solr\core\src\java\org\apache\solr\response\PHPResponseWriter.java(27): static String CONTENT_TYPE_PHP_UTF8= "text/x-php;charset=UTF-8" ; solr\core\src\java\org\apache\solr\response\PHPSerializedResponseWriter.java(42): static String CONTENT_TYPE_PHP_UTF8= "text/x-php-serialized;charset=UTF-8" ; solr\core\src\java\org\apache\solr\response\QueryResponseWriter.java(46): public static String CONTENT_TYPE_XML_UTF8= "application/xml; charset=UTF-8" ; solr\core\src\java\org\apache\solr\response\QueryResponseWriter.java(47): public static String CONTENT_TYPE_TEXT_UTF8= "text/plain; charset=UTF-8" ; solr\core\src\java\org\apache\solr\response\RubyResponseWriter.java(26): static String CONTENT_TYPE_RUBY_UTF8= "text/x-ruby;charset=UTF-8" ; solr\core\src\java\org\apache\solr\servlet\SolrRequestParsers.java(638): if ( idx > 0 ) { // remove the charset definition "; charset=utf-8" solr\core\src\test\org\apache\solr\analysis\htmlStripReaderTest.html(4): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" > solr\core\src\test\org\apache\solr\servlet\SolrRequestParserTest.java(216): "application/x-www-form-urlencoded; charset=utf-8" , solr\core\src\test-files\solr\collection1\conf\solrconfig-implicitproperties.xml(77): <str name= "content-type" >text/plain; charset=UTF-8</str> solr\docs\changes\Changes.html(31): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" /> solr\docs\changes\Changes.html(7435): have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8' solr\docs\changes\Changes.html(7527): content. Using the contentType: "text/xml; charset=utf-8" will force solr\example\example-DIH\solr\db\conf\solrconfig.xml(407): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\example\example-DIH\solr\db\conf\xslt\example.xsl(27): <xsl:output media-type= "text/html; charset=UTF-8" encoding= "UTF-8" /> solr\example\example-DIH\solr\db\conf\xslt\luke.xsl(38): <meta http-equiv= "Content-Type" content= "application/xhtml+xml; charset=UTF-8" /> solr\example\example-DIH\solr\mail\conf\solrconfig.xml(565): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\example\example-DIH\solr\rss\conf\solrconfig.xml(406): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\example\example-DIH\solr\solr\conf\solrconfig.xml(405): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\example\example-DIH\solr\tika\conf\solrconfig.xml(326): the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8' solr\example\example-schemaless\solr\collection1\conf\solrconfig.xml(999): requires: -H 'Content-type:text/xml; charset=utf-8' solr\example\example-schemaless\solr\collection1\conf\solrconfig.xml(1710): <str name= "content-type" >text/plain; charset=UTF-8</str> solr\example\exampledocs\test_utf8.sh(41): curl $URL/select --data-binary 'q=h%C3%A9llo&echoParams=explicit&wt=python' -H 'Content-type:application/x-www-form-urlencoded; charset=UTF-8' 2> /dev/ null | grep 'h\\u00e9llo' > /dev/ null 2>&1 solr\example\exampledocs\test_utf8.sh(71): curl $URL/select --data-binary "q=$URL_UTF8&echoParams=explicit&wt=python" -H 'Content-type:application/x-www-form-urlencoded; charset=UTF-8' 2> /dev/ null | grep $EXPECTED > /dev/ null 2>&1 solr\example\solr\collection1\conf\solrconfig.xml(1006): requires: -H 'Content-type:text/xml; charset=utf-8' solr\example\solr\collection1\conf\solrconfig.xml(1751): <str name= "content-type" >text/plain; charset=UTF-8</str> solr\example\solr\collection1\conf\velocity\head.vm(7): <meta http-equiv= "content-type" content= "text/html; charset=UTF-8" /> solr\scripts\abc(95): rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d "<commit/>" ` solr\scripts\abo(94): rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d "<optimize/>" ` solr\scripts\commit(87): rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d "<commit/>" ` solr\scripts\optimize(87): rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d "<optimize/>" ` solr\site\html\tutorial.html(20): <META http-equiv= "Content-Type" content= "text/html; charset=UTF-8" /> solr\solrj\src\java\org\apache\solr\client\solrj\impl\HttpSolrServer.java(273): "application/x-www-form-urlencoded; charset=UTF-8" ); solr\solrj\src\java\org\apache\solr\client\solrj\impl\XMLResponseParser.java(50): public static final String XML_CONTENT_TYPE = "application/xml; charset=UTF-8" ; solr\solrj\src\java\org\apache\solr\client\solrj\util\ClientUtils.java(54): public static final String TEXT_XML = "application/xml; charset=UTF-8" ; solr\solrj\src\test\org\apache\solr\client\solrj\impl\BasicHttpSolrServerTest.java(253): assertEquals( "application/x-www-form-urlencoded; charset=UTF-8" , DebugServlet.headers.get( "Content-Type" )); solr\solrj\src\test\org\apache\solr\client\solrj\impl\BasicHttpSolrServerTest.java(289): assertEquals( "application/x-www-form-urlencoded; charset=UTF-8" , DebugServlet.headers.get( "Content-Type" )); solr\solrj\src\test\org\apache\solr\client\solrj\impl\BasicHttpSolrServerTest.java(358): assertEquals( "application/xml; charset=UTF-8" , DebugServlet.headers.get( "Content-Type" )); solr\solrj\src\test\org\apache\solr\client\solrj\impl\BasicHttpSolrServerTest.java(374): assertEquals( "application/xml; charset=UTF-8" , DebugServlet.headers.get( "Content-Type" )); solr\test-framework\src\java\org\apache\solr\util\RestTestHarness.java(149): connection.setRequestProperty( "Content-Type" , "application/json; charset=utf-8" ); solr\webapp\web\js\scripts\dashboard.js(57): core_basepath + '/admin/file/?file=admin-extra.menu-top.html&contentType=text/html;charset=utf-8', solr\webapp\web\js\scripts\dashboard.js(67): core_basepath + '/admin/file/?file=admin-extra.menu-bottom.html&contentType=text/html;charset=utf-8', solr\webapp\web\js\scripts\file.js(29): + '&contentType=text/xml;charset=utf-8'; Es wurden 74 Vorkommen in 62 Datei(en) gefunden.
          Hide
          Shawn Heisey added a comment -

          Side-Question:
          The XMLResponseParser is set in my custom application since very long. I cannot recall when or why set that specific parser.
          Could anyone give my some information whether, when or why this parser should (not) be set?

          The XML parser typically is used for compatibility.

          HttpSolrServer defaults to the binary response parser and the XML request writer. The javabin format (used by the binary writer/parser) changed a few years back. Solr/SolrJ Version 1.4.1 used version 1, but the next version that was released (3.1.0) used version 2. They are completely incompatible with each other, so when you need to have communication between incompatible versions, you need to change it to use XML.

          I tried setting my parser to a new XMLResponseParser, but I still can't duplicate your symptom. I will attach a screenshot showing my code in eclipse along with the eclipse build path.

          Show
          Shawn Heisey added a comment - Side-Question: The XMLResponseParser is set in my custom application since very long. I cannot recall when or why set that specific parser. Could anyone give my some information whether, when or why this parser should (not) be set? The XML parser typically is used for compatibility. HttpSolrServer defaults to the binary response parser and the XML request writer. The javabin format (used by the binary writer/parser) changed a few years back. Solr/SolrJ Version 1.4.1 used version 1, but the next version that was released (3.1.0) used version 2. They are completely incompatible with each other, so when you need to have communication between incompatible versions, you need to change it to use XML. I tried setting my parser to a new XMLResponseParser, but I still can't duplicate your symptom. I will attach a screenshot showing my code in eclipse along with the eclipse build path.
          Hide
          Uwe Schindler added a comment -

          There were 2 more issues opened about the same, this time not with PingRequestHandler. The problem seems to be in HttpSolrServer's parsing of ContentType in general.

          Show
          Uwe Schindler added a comment - There were 2 more issues opened about the same, this time not with PingRequestHandler. The problem seems to be in HttpSolrServer's parsing of ContentType in general.
          Hide
          Hoss Man added a comment -

          As far as i can tell from a quick check, the XML response writer (on the server) has always included a space in it's Content-Type, and that doesn't seem to have changed in 4.6, and the solrj expected XML Content-Type (on the client parser) has always expected a space in the Content-Type, and that doesn't seem to have changed in 4.6.

          Ok, here's what i reallize now that i overlooked when i skimmed the old code yesterday:

          • the content type as written by the server has not changed
          • the content type as written by the server, and the content type as parsed by the client are in fact 100% identical (same variable)
          • what's new in 4.6 is the check to verify that the ContentType's are equal (In my previous skim i totally overlooked that the code throwing this error is new in 4.6, added as part of SOLR-3530

          That still, however, doesn't explain the bug report, and why elyograg and i can't reproduce, and why the solrj xml tests aren't reproducing it – specifically: and what's happening to the space character to cause this exception.

          My best guess is that people encountering this problem aren't using the provided jetty server, and the server they are using (or some proxy in between their client and the server) is re-writing the the header slightly. the fact that the initial problem report here refers to port "8080" smells like i may be on to something.

          Either way: we should fix how the Content-Type comparison is done.

          Show
          Hoss Man added a comment - As far as i can tell from a quick check, the XML response writer (on the server) has always included a space in it's Content-Type, and that doesn't seem to have changed in 4.6, and the solrj expected XML Content-Type (on the client parser) has always expected a space in the Content-Type, and that doesn't seem to have changed in 4.6. Ok, here's what i reallize now that i overlooked when i skimmed the old code yesterday: the content type as written by the server has not changed the content type as written by the server, and the content type as parsed by the client are in fact 100% identical (same variable) what's new in 4.6 is the check to verify that the ContentType's are equal (In my previous skim i totally overlooked that the code throwing this error is new in 4.6, added as part of SOLR-3530 That still, however, doesn't explain the bug report, and why elyograg and i can't reproduce, and why the solrj xml tests aren't reproducing it – specifically: and what's happening to the space character to cause this exception. My best guess is that people encountering this problem aren't using the provided jetty server, and the server they are using (or some proxy in between their client and the server) is re-writing the the header slightly. the fact that the initial problem report here refers to port "8080" smells like i may be on to something. Either way: we should fix how the Content-Type comparison is done.
          Hide
          Hoss Man added a comment -

          clarify summary and description, not specific to ping.

          Show
          Hoss Man added a comment - clarify summary and description, not specific to ping.
          Hide
          Mark Miller added a comment -

          This is why using Jetty is the best choice We actually test it.

          I think we should make the comparison less strict right away, this can go out in 4.6.1.

          Beyond that, Hossman has some interesting ideas for improving the API's, but unless someone bangs that out right away, I think the straightforward fix is the right initial thing to do.

          Show
          Mark Miller added a comment - This is why using Jetty is the best choice We actually test it. I think we should make the comparison less strict right away, this can go out in 4.6.1. Beyond that, Hossman has some interesting ideas for improving the API's, but unless someone bangs that out right away, I think the straightforward fix is the right initial thing to do.
          Hide
          Mark Miller added a comment -

          the content type as written by the server, and the content type as parsed by the client are in fact 100% identical (same variable)

          I was careful to ensure that.

          Show
          Mark Miller added a comment - the content type as written by the server, and the content type as parsed by the client are in fact 100% identical (same variable) I was careful to ensure that.
          Hide
          Uwe Schindler added a comment -

          My best guess is that people encountering this problem aren't using the provided jetty server, and the server they are using (or some proxy in between their client and the server) is re-writing the the header slightly. the fact that the initial problem report here refers to port "8080" smells like i may be on to something.

          When thinking about this, I know for example that Oracle iPlanet webserver does this! And iPlanet uses Catilina internally which is the servlet engine of Tomcat...

          Show
          Uwe Schindler added a comment - My best guess is that people encountering this problem aren't using the provided jetty server, and the server they are using (or some proxy in between their client and the server) is re-writing the the header slightly. the fact that the initial problem report here refers to port "8080" smells like i may be on to something. When thinking about this, I know for example that Oracle iPlanet webserver does this! And iPlanet uses Catilina internally which is the servlet engine of Tomcat...
          Hide
          Mark Miller added a comment -

          A first whack.

          Show
          Mark Miller added a comment - A first whack.
          Hide
          Uwe Schindler added a comment - - edited

          Validated this with the Catilina based PANGAEA server (Oracle iPlanet webserver):

          VEGA:~ > curl -D - "http://ws.pangaea.de/oai/?verb=Identify"
          HTTP/1.1 200 OK
          Server: PANGAEA/1.0
          Date: Fri, 06 Dec 2013 18:17:59 GMT
          Content-type: text/xml;charset=UTF-8
          Transfer-encoding: chunked
          
          <?xml version="1.0" encoding="UTF-8"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
          ...
          

          And here is how the servlets sets Content-Type:

          resp.setContentType("text/xml; charset="+charset);
          

          It looks like every Tomcat does this. The reason for this is: The Content-Type header is generally not passed unparsed to the output, the servlet container (Catilina) does some extra parsing, to detect the charset, so when you call the broken getWriter() that the writer has correct charset. Most webservers also do header normalization afterwards (they combine multiple headers into one and also remove whitespace).

          The correct way to handle this is:

          • Use ContentStreamBase to extract the MIME-Type and the charset from the full Content-Type string (MIME-Type != Content-Type, that's the fault here). We have the methods already available and they should also be available to SolrJ.
          • Compare charset and MIME type with equalsIgnoreCase. But: charset does not need to be compared. The XML parser should do this afterwards, not need to enforce a specific charset in SolrJ. It should only enforce the MIME-Type!
          Show
          Uwe Schindler added a comment - - edited Validated this with the Catilina based PANGAEA server (Oracle iPlanet webserver): VEGA:~ > curl -D - "http://ws.pangaea.de/oai/?verb=Identify" HTTP/1.1 200 OK Server: PANGAEA/1.0 Date: Fri, 06 Dec 2013 18:17:59 GMT Content-type: text/xml;charset=UTF-8 Transfer-encoding: chunked <?xml version="1.0" encoding="UTF-8"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> ... And here is how the servlets sets Content-Type: resp.setContentType( "text/xml; charset=" +charset); It looks like every Tomcat does this. The reason for this is: The Content-Type header is generally not passed unparsed to the output, the servlet container (Catilina) does some extra parsing, to detect the charset, so when you call the broken getWriter() that the writer has correct charset. Most webservers also do header normalization afterwards (they combine multiple headers into one and also remove whitespace). The correct way to handle this is: Use ContentStreamBase to extract the MIME-Type and the charset from the full Content-Type string (MIME-Type != Content-Type, that's the fault here). We have the methods already available and they should also be available to SolrJ. Compare charset and MIME type with equalsIgnoreCase. But: charset does not need to be compared. The XML parser should do this afterwards, not need to enforce a specific charset in SolrJ. It should only enforce the MIME-Type!
          Hide
          Jakob Furrer added a comment -

          I can confirm, that the problem I described above does not occur when I use Jetty to run the stock solr that is included in Solr 4.6.
          However, I see this problem when solr is run in Tomcat.
          In my custom application that accesses solrI see that
          a) the ping fails only when XMLResponseParser is set on the HttpSolrServer
          b) adding document also fails with the same error message as above (even if XMLResponseParser is not set on the HttpSolrServer)

          Show
          Jakob Furrer added a comment - I can confirm, that the problem I described above does not occur when I use Jetty to run the stock solr that is included in Solr 4.6. However, I see this problem when solr is run in Tomcat. In my custom application that accesses solrI see that a) the ping fails only when XMLResponseParser is set on the HttpSolrServer b) adding document also fails with the same error message as above (even if XMLResponseParser is not set on the HttpSolrServer)
          Hide
          Uwe Schindler added a comment -

          Mark: Patch looks ok. Exactly as I wanted to have it Thanks!

          Show
          Uwe Schindler added a comment - Mark: Patch looks ok. Exactly as I wanted to have it Thanks!
          Hide
          ASF subversion and git services added a comment -

          Commit 1548659 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1548659 ]

          SOLR-5532: SolrJ Content-Type validation is too strict for some webcontainers / proxies.

          Show
          ASF subversion and git services added a comment - Commit 1548659 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1548659 ] SOLR-5532 : SolrJ Content-Type validation is too strict for some webcontainers / proxies.
          Hide
          ASF subversion and git services added a comment -

          Commit 1548661 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1548661 ]

          SOLR-5532: SolrJ Content-Type validation is too strict for some webcontainers / proxies.

          Show
          ASF subversion and git services added a comment - Commit 1548661 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1548661 ] SOLR-5532 : SolrJ Content-Type validation is too strict for some webcontainers / proxies.
          Hide
          Uwe Schindler added a comment -

          As there is a bugfix release planned: I think this one should also go in 4.6.1. Looks like many people are affected. One of my customers called today, too (using Tomcat).

          Show
          Uwe Schindler added a comment - As there is a bugfix release planned: I think this one should also go in 4.6.1. Looks like many people are affected. One of my customers called today, too (using Tomcat).
          Hide
          ASF subversion and git services added a comment -

          Commit 1553949 from Mark Miller in branch 'dev/branches/lucene_solr_4_6'
          [ https://svn.apache.org/r1553949 ]

          SOLR-5532: SolrJ Content-Type validation is too strict for some webcontainers / proxies.

          Show
          ASF subversion and git services added a comment - Commit 1553949 from Mark Miller in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1553949 ] SOLR-5532 : SolrJ Content-Type validation is too strict for some webcontainers / proxies.
          Hide
          ASF subversion and git services added a comment -

          Commit 1553950 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1553950 ]

          SOLR-5532: Move CHANGES entry to 4.6.1

          Show
          ASF subversion and git services added a comment - Commit 1553950 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1553950 ] SOLR-5532 : Move CHANGES entry to 4.6.1
          Hide
          ASF subversion and git services added a comment -

          Commit 1553951 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1553951 ]

          SOLR-5532: Move CHANGES entry to 4.6.1

          Show
          ASF subversion and git services added a comment - Commit 1553951 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1553951 ] SOLR-5532 : Move CHANGES entry to 4.6.1
          Hide
          Mark Miller added a comment -

          Thanks all!

          Show
          Mark Miller added a comment - Thanks all!
          Hide
          Magnus Lövgren added a comment -

          I've upgraded from 4.4 to 4.10.1 and have been struggling somewhat with my code that was affected by this change. Some observations that might be useful for others too:

          The patch relies on the "org.apache.http.entity.ContentType.parse" method. It fails when parsing an empty string. That's fine (empty string should probably not be seen as a valid type anyway). The caveat is that an empty string is actually used as the "fallback" contentType if the response has no Content-Type header! This would be the typical case if the response is a 401 (typically has no Content-Type).

          • In prior versions a 401 response threw a SolrException with code() 401
          • Now a SolrServerException is thrown (caused by a org.apache.http.ParseException). Hard to determine if it was due to bad credentials (401).

          To restore previous behaviour, you'd presumably add the HttpStatus.SC_UNAUTHORIZED case to the switch and then throw a RemoteSolrException (with code 401). In other words - fail early for 401 response (there's no content to parse anyway)

          Show
          Magnus Lövgren added a comment - I've upgraded from 4.4 to 4.10.1 and have been struggling somewhat with my code that was affected by this change. Some observations that might be useful for others too: The patch relies on the "org.apache.http.entity.ContentType.parse" method. It fails when parsing an empty string. That's fine (empty string should probably not be seen as a valid type anyway). The caveat is that an empty string is actually used as the "fallback" contentType if the response has no Content-Type header! This would be the typical case if the response is a 401 (typically has no Content-Type). In prior versions a 401 response threw a SolrException with code() 401 Now a SolrServerException is thrown (caused by a org.apache.http.ParseException). Hard to determine if it was due to bad credentials (401). To restore previous behaviour, you'd presumably add the HttpStatus.SC_UNAUTHORIZED case to the switch and then throw a RemoteSolrException (with code 401). In other words - fail early for 401 response (there's no content to parse anyway)
          Hide
          Mark Miller added a comment -

          I ran into this same issue in a review for Cloudera Search before I went on vacation a couple weeks ago. Technically, it was a back compat break. Please file a JIRA issue and we can address it.

          Show
          Mark Miller added a comment - I ran into this same issue in a review for Cloudera Search before I went on vacation a couple weeks ago. Technically, it was a back compat break. Please file a JIRA issue and we can address it.
          Hide
          Magnus Lövgren added a comment -

          The 401 issue is now added as SOLR-6669

          Show
          Magnus Lövgren added a comment - The 401 issue is now added as SOLR-6669

            People

            • Assignee:
              Mark Miller
              Reporter:
              Jakob Furrer
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development