Solr
  1. Solr
  2. SOLR-2350

improve post.jar to handle non UTF-8 files

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      thanks to all the awesomeness Uwe did in SOLR-96, some hard coded limitations/assumptions in the simple post.jar provided for the example files can be cleaned up.

      notably: it use to deal with Readers/Writers, and warned people there data had to be UTF-8 (because that's all Solr supported) and now it can deal with raw streams

      1. SOLR-2350.patch
        14 kB
        Hoss Man
      2. SOLR-2350.patch
        14 kB
        Hoss Man

        Activity

        Hide
        Hoss Man added a comment -

        Attached file makes a bunch of changes to SimplePostTool.java, notably:

        • stop using Reader/Writer - stream bytes directly
        • use application/xml as default mime-type, but let user override
        • look at HTTP status to determine if there is an error (instead of string comparisons on the response)
        • ignore the response body from the POST by default, but let the user choose to see it

        ...the last two making it more feasible to use this when dealing with things like that Document Analysis tool (since you can now see the response if you want)

        The patch also includes a gb2312-example.xml similar to the utf8-example.xml showing off non-ascii characters. the big hitch here is that i'm only guessing that this really a properly encoded gb2312 – i did the best i could to make my editor create one, but i have no idea if it worked properly. it seems to index correctly, but for all i know it's really still just UTF-8

        Show
        Hoss Man added a comment - Attached file makes a bunch of changes to SimplePostTool.java, notably: stop using Reader/Writer - stream bytes directly use application/xml as default mime-type, but let user override look at HTTP status to determine if there is an error (instead of string comparisons on the response) ignore the response body from the POST by default, but let the user choose to see it ...the last two making it more feasible to use this when dealing with things like that Document Analysis tool (since you can now see the response if you want) The patch also includes a gb2312-example.xml similar to the utf8-example.xml showing off non-ascii characters. the big hitch here is that i'm only guessing that this really a properly encoded gb2312 – i did the best i could to make my editor create one, but i have no idea if it worked properly. it seems to index correctly, but for all i know it's really still just UTF-8
        Hide
        Hoss Man added a comment -

        Some updates...

        • improvements to the example file based on rmuir's suggestions in IRC
        • simplified some error handling so it's consistent
        • incorporated Li Li's suggestion from a recent mailing list post about using the file length when posting files.
        Show
        Hoss Man added a comment - Some updates... improvements to the example file based on rmuir's suggestions in IRC simplified some error handling so it's consistent incorporated Li Li's suggestion from a recent mailing list post about using the file length when posting files.
        Hide
        Hoss Man added a comment -

        Committed revision 1068149. - trunk
        Committed revision 1068152. - 3x

        Show
        Hoss Man added a comment - Committed revision 1068149. - trunk Committed revision 1068152. - 3x
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1.0 release

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1.0 release

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development