Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-630

Metadata names with illegal URL characters blow up the ManifoldCF solr connector

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • ManifoldCF 1.1
    • ManifoldCF 1.1
    • Lucene/SOLR connector
    • None

    Description

      If a document has a name with a hash symbol (#) in it, and you try to ingest that into Solr via the Solr connector, SolrJ throws an IllegalArgumentException and the worker thread goes into an infinite loop.

      FATAL 2013-01-30 17:46:13,664 (Worker thread '20') - Error tossed: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2
      
      java.lang.IllegalArgumentException: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2
      
                      at java.net.URI.create(Unknown Source)
      
                      at org.apache.http.client.methods.HttpPost.<init>(HttpPost.java:76)
      
                      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:286)
      
                      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
      
                      at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
      
                      at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:797)
      
      Caused by: java.net.URISyntaxException: Illegal character in query at index 537: http://localhost:8080/solr/Lisa/update/extract?literal.id=https%3A%2F%2Fopentextdev2.llan.ll.mit.edu%2Fcs%2Fllisapi.dll%3Ffunc%3Dll%26objID%3D1016599%26objAction%3Ddownload&literal.allow_token_document=LISA-Authority-DEV%3A1005367&literal.allow_token_document=LISA-Authority-DEV%3A68276&literal.allow_token_document=LISA-Authority-DEV%3A796642&literal.allow_token_document=LISA-Authority-DEV%3AGUEST&literal.allow_token_document=LISA-Authority-DEV%3ASYSTEM&literal.deny_token_document=LISA-Authority-DEV%3ADEAD_AUTHORITY&literal.Document Info:Keyword / Phrase=%3F&literal.general_creator=th23825&literal.Document Info:Performing Organization=%3F&literal.general_description=&literal.general_modifier=th23825&literal.general_creationdate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Date=%3F&literal.Document Info:Document Author(s)=%3F&literal.general_name=%23raodoc4.txt%3E&literal.ll_filename=%23raodoc4.txt%3E&literal.general_owner=th23825&literal.Document Info:Document Revision Notes=%3F&literal.Document Info:Data Classification=For+Laboratory+Use+Only+%28FLUO%29&literal.general_modifydate=Wed+Nov+14+09%3A28%3A16+EST+2012&literal.Document Info:Document Description=%3F&commitWithin=4000&wt=xml&version=2.2
      
                      at java.net.URI$Parser.fail(Unknown Source)
      
                      at java.net.URI$Parser.checkChars(Unknown Source)
      
                      at java.net.URI$Parser.parseHierarchical(Unknown Source)
      
                      at java.net.URI$Parser.parse(Unknown Source)
      
                      at java.net.URI.<init>(Unknown Source)
      
                      ... 6 more
      

      Attachments

        Issue Links

          Activity

            People

              kwright@metacarta.com Karl Wright
              kwright@metacarta.com Karl Wright
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: