Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-936

RepositoryDocuments with binaryFieldData = null causes issues with solr

    XMLWordPrintableJSON

Details

    Description

      If a RepositoryDocument is ingested into an activity without an InputStream set using the setBinary method, it causes errors with the solr output connector:

      java.lang.IllegalArgumentException: Input stream may not be null
      	at org.apache.http.util.Args.notNull(Args.java:48)
      	at org.apache.http.entity.mime.content.InputStreamBody.<init>(InputStreamBody.java:70)
      	at org.apache.http.entity.mime.content.InputStreamBody.<init>(InputStreamBody.java:58)
      	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:201)
      	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
      	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
      	at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
      

      This can be replicated by trying to ingest documents from a CMIS repository which contain no content.

      The dirty workaround I've come up with is just to provide a Null Input Stream

      In CmisRepositoryConnector.java:

      Import NullInputStream from commons:

      import org.apache.commons.io.input.NullInputStream;
      

      And Change:

                if(fileLength>0 && document.getContentStream()!=null){
                  is = document.getContentStream().getStream();
                  rd.setBinary(is, fileLength);
                }
      

      To:

                if(fileLength>0 && document.getContentStream()!=null){
                  is = document.getContentStream().getStream();
                  rd.setBinary(is, fileLength);
                } else {
                  rd.setBinary(new NullInputStream(0),0);
                }
      

      I'm not sure what the correct fix would be. Possibly change the RepositoryDocument class or handle the situation correctly in the Solr connector.

      It doesn't seem to be an issue with other repository connectors, such as FileConnector, as they always provide an InputStream.

      Attachments

        1. CmisRepositoryConnector.patch
          0.1 kB
          Cetra Free

        Activity

          People

            kwright@metacarta.com Karl Wright
            cetra3 Cetra Free
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: