Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-936

RepositoryDocuments with binaryFieldData = null causes issues with solr

    XMLWordPrintableJSON

    Details

      Description

      If a RepositoryDocument is ingested into an activity without an InputStream set using the setBinary method, it causes errors with the solr output connector:

      java.lang.IllegalArgumentException: Input stream may not be null
      	at org.apache.http.util.Args.notNull(Args.java:48)
      	at org.apache.http.entity.mime.content.InputStreamBody.<init>(InputStreamBody.java:70)
      	at org.apache.http.entity.mime.content.InputStreamBody.<init>(InputStreamBody.java:58)
      	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:201)
      	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
      	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
      	at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
      

      This can be replicated by trying to ingest documents from a CMIS repository which contain no content.

      The dirty workaround I've come up with is just to provide a Null Input Stream

      In CmisRepositoryConnector.java:

      Import NullInputStream from commons:

      import org.apache.commons.io.input.NullInputStream;
      

      And Change:

                if(fileLength>0 && document.getContentStream()!=null){
                  is = document.getContentStream().getStream();
                  rd.setBinary(is, fileLength);
                }
      

      To:

                if(fileLength>0 && document.getContentStream()!=null){
                  is = document.getContentStream().getStream();
                  rd.setBinary(is, fileLength);
                } else {
                  rd.setBinary(new NullInputStream(0),0);
                }
      

      I'm not sure what the correct fix would be. Possibly change the RepositoryDocument class or handle the situation correctly in the Solr connector.

      It doesn't seem to be an issue with other repository connectors, such as FileConnector, as they always provide an InputStream.

        Attachments

        1. CmisRepositoryConnector.patch
          0.1 kB
          Cetra Free

          Activity

            People

            • Assignee:
              kwright@metacarta.com Karl Wright
              Reporter:
              cetra3 Cetra Free
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: