Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4358

SolrJ, by preventing multi-part post, loses key information about file name that Tika needs

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.4, 6.0
    • Component/s: clients - java
    • Labels:
      None

      Description

      SolrJ accepts a ContentStream, which has a name field. Within HttpSolrServer.java, if SolrJ makes the decision to use multipart posts, this filename is transmitted as part of the form boundary information. However, if SolrJ chooses not to use multipart post, the filename information is lost.

      This information is used by SolrCell (Tika) to make decisions about content extraction, so it is very important that it makes it into Solr in one way or another. Either SolrJ should set appropriate equivalent headers to send the filename automatically, or it should force multipart posts when this information is present.

        Attachments

        1. additional_changes.diff
          3 kB
          Karl Wright
        2. SOLR-4358.patch
          5 kB
          Ryan McKinley
        3. SOLR-4358.patch
          4 kB
          Ryan McKinley
        4. SOLR-4358.patch
          3 kB
          Karl Wright

        Issue Links

          Activity

            People

            • Assignee:
              ryantxu Ryan McKinley
              Reporter:
              kwright@metacarta.com Karl Wright

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment