Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: search
    • Labels:
      None

      Description

      Lucene supports binary data for field but Solr has no corresponding field type.

      1. SOLR-1116.patch
        17 kB
        Noble Paul
      2. SOLR-1116.patch
        18 kB
        Noble Paul
      3. SOLR-1116.patch
        29 kB
        Noble Paul

        Activity

        Hide
        Grant Ingersoll added a comment -

        Bulk close Solr 1.4 issues

        Show
        Grant Ingersoll added a comment - Bulk close Solr 1.4 issues
        Hide
        Noble Paul added a comment - - edited

        committed revision:778600

        Show
        Noble Paul added a comment - - edited committed revision:778600
        Hide
        Noble Paul added a comment -

        I plan to commit this in a day or two . Please let me know if there is any feedback

        Show
        Noble Paul added a comment - I plan to commit this in a day or two . Please let me know if there is any feedback
        Hide
        Noble Paul added a comment -

        The text format is standard base64 encoding

        Show
        Noble Paul added a comment - The text format is standard base64 encoding
        Hide
        Andrzej Bialecki added a comment -

        Indeed! then it's not relevant here. +0 from me for the regular base64.

        Show
        Andrzej Bialecki added a comment - Indeed! then it's not relevant here. +0 from me for the regular base64.
        Hide
        Noble Paul added a comment -

        hi Andrzej from the wikipedia documentation what I understand is that browsers support standard base64 not the url-safe version

        Show
        Noble Paul added a comment - hi Andrzej from the wikipedia documentation what I understand is that browsers support standard base64 not the url-safe version
        Hide
        Noble Paul added a comment -

        silly me

        Show
        Noble Paul added a comment - silly me
        Hide
        Andrzej Bialecki added a comment -

        No browser accepts the image data as Base64. your front-end will have to read the string and send it out as a byte[].

        Please see http://en.wikipedia.org/wiki/Data_URI_scheme - this is the use case I was referring to, and indeed you can send base64 encoded content directly to any modern browser.

        Show
        Andrzej Bialecki added a comment - No browser accepts the image data as Base64. your front-end will have to read the string and send it out as a byte[]. Please see http://en.wikipedia.org/wiki/Data_URI_scheme - this is the use case I was referring to, and indeed you can send base64 encoded content directly to any modern browser.
        Hide
        Noble Paul added a comment -

        I don't want yet another parameter that lets you choose

        +1

        I am fine with either format. we will stick to one .

        +0 for standard base64

        ou can directly embed the returned string without re-encoding it.

        No browser accepts the image data as Base64. your front-end will have to read the string and send it out as a byte[].

        Show
        Noble Paul added a comment - I don't want yet another parameter that lets you choose +1 I am fine with either format. we will stick to one . +0 for standard base64 ou can directly embed the returned string without re-encoding it. No browser accepts the image data as Base64. your front-end will have to read the string and send it out as a byte[].
        Hide
        Andrzej Bialecki added a comment -

        One scenario that I have experience with is when you store small images as fields, to be displayed on the result list. URL-safe encoding means you can directly embed the returned string without re-encoding it.

        Show
        Andrzej Bialecki added a comment - One scenario that I have experience with is when you store small images as fields, to be displayed on the result list. URL-safe encoding means you can directly embed the returned string without re-encoding it.
        Hide
        Yonik Seeley added a comment -

        There's no reason we can't accept either Base64 variant as input.
        For output, should it be the normal Base64 or the URL-safe variant? (and no, I don't want yet another parameter that lets you choose

        Show
        Yonik Seeley added a comment - There's no reason we can't accept either Base64 variant as input. For output, should it be the normal Base64 or the URL-safe variant? (and no, I don't want yet another parameter that lets you choose
        Hide
        Ryan McKinley added a comment -

        quick google seach shows a few options in other languages:
        http://search.cpan.org/~kazuho/MIME-Base64-URLSafe-0.01/lib/MIME/Base64/URLSafe.pm

        In php, you can use "base64_url_encode"
        http://us.php.net/base64_encode

        Show
        Ryan McKinley added a comment - quick google seach shows a few options in other languages: http://search.cpan.org/~kazuho/MIME-Base64-URLSafe-0.01/lib/MIME/Base64/URLSafe.pm In php, you can use "base64_url_encode" http://us.php.net/base64_encode
        Hide
        Noble Paul added a comment - - edited

        my only concern is that does the standard tools available in other languages (php/python etc) for Base64 encoding/decoding work with the url safe format? if it doesn't, it beats the purpose

        Show
        Noble Paul added a comment - - edited my only concern is that does the standard tools available in other languages (php/python etc) for Base64 encoding/decoding work with the url safe format? if it doesn't, it beats the purpose
        Hide
        Ryan McKinley added a comment -
        why do we need it to be url safe?

        More then anything it seems like a pending gotcha and the fix (with URL safe base64) is trivial.

        Show
        Ryan McKinley added a comment - why do we need it to be url safe? More then anything it seems like a pending gotcha and the fix (with URL safe base64) is trivial.
        Hide
        Noble Paul added a comment -

        the patch will not apply on Solr 1.3 . it will apply only on trunk

        Show
        Noble Paul added a comment - the patch will not apply on Solr 1.3 . it will apply only on trunk
        Hide
        Tao Jiang added a comment -

        I just couldn't patch it to Solr 1.3. JavaBinCodec.java and solrconfig-slave1.xml do not actually exist in apache-solr-1.3.0 release. And when I tried again after removing those code related with the two files from the patch, I got the following response:

        patch -p0 <SOLR-1116.patch
        patching file src/java/org/apache/solr/request/BinaryResponseWriter.java
        Hunk #1 FAILED at 210.
        1 out of 1 hunk FAILED – saving rejects to file src/java/org/apache/solr/request/BinaryResponseWriter.java.rej
        patching file src/java/org/apache/solr/schema/BinaryField.java
        patching file src/java/org/apache/solr/update/DocumentBuilder.java
        Hunk #1 FAILED at 29.
        Hunk #2 FAILED at 216.
        Hunk #3 FAILED at 225.
        Hunk #4 FAILED at 263.
        Hunk #5 FAILED at 286.
        5 out of 5 hunks FAILED – saving rejects to file src/java/org/apache/solr/update/DocumentBuilder.java.rej
        patching file src/test/org/apache/solr/schema/TestBinaryField.java
        patching file src/test/test-files/solr/conf/schema-binaryfield.xml

        Show
        Tao Jiang added a comment - I just couldn't patch it to Solr 1.3. JavaBinCodec.java and solrconfig-slave1.xml do not actually exist in apache-solr-1.3.0 release. And when I tried again after removing those code related with the two files from the patch, I got the following response: patch -p0 < SOLR-1116 .patch patching file src/java/org/apache/solr/request/BinaryResponseWriter.java Hunk #1 FAILED at 210. 1 out of 1 hunk FAILED – saving rejects to file src/java/org/apache/solr/request/BinaryResponseWriter.java.rej patching file src/java/org/apache/solr/schema/BinaryField.java patching file src/java/org/apache/solr/update/DocumentBuilder.java Hunk #1 FAILED at 29. Hunk #2 FAILED at 216. Hunk #3 FAILED at 225. Hunk #4 FAILED at 263. Hunk #5 FAILED at 286. 5 out of 5 hunks FAILED – saving rejects to file src/java/org/apache/solr/update/DocumentBuilder.java.rej patching file src/test/org/apache/solr/schema/TestBinaryField.java patching file src/test/test-files/solr/conf/schema-binaryfield.xml
        Hide
        Noble Paul added a comment -

        for url-safe base64 (-_ being the extra chars)

        why do we need it to be url safe?. I guess the binary fields in Lucene is not indexed (Just stored) . If that is the case then we may not need to send it in the 'q' param.

        Show
        Noble Paul added a comment - for url-safe base64 (-_ being the extra chars) why do we need it to be url safe?. I guess the binary fields in Lucene is not indexed (Just stored) . If that is the case then we may not need to send it in the 'q' param.
        Hide
        Noble Paul added a comment -

        We could use base64 as the standard for input/output in text protocols

        I thought of it. But we will need to add some external library to do base64 encode<->decode . Which one to use?

        There is one other challenge. The xml response format does not have a binary type . for backcompat reasons I have used <str> as the type. To add another type we may need to bump up the version

        Show
        Noble Paul added a comment - We could use base64 as the standard for input/output in text protocols I thought of it. But we will need to add some external library to do base64 encode<->decode . Which one to use? There is one other challenge. The xml response format does not have a binary type . for backcompat reasons I have used <str> as the type. To add another type we may need to bump up the version
        Hide
        Mike Klaas added a comment -

        +1 for url-safe base64 (-_ being the extra chars)

        Show
        Mike Klaas added a comment - +1 for url-safe base64 (-_ being the extra chars)
        Hide
        Ryan McKinley added a comment -

        Perhaps we could use a url safe option too... this would let use query binary fields with our existing interfaces.

        Check:
        http://iharder.sourceforge.net/current/java/base64/

        http://iharder.sourceforge.net/current/java/base64/api/Base64.html#URL_SAFE

        That refers to RFC3548

        Show
        Ryan McKinley added a comment - Perhaps we could use a url safe option too... this would let use query binary fields with our existing interfaces. Check: http://iharder.sourceforge.net/current/java/base64/ http://iharder.sourceforge.net/current/java/base64/api/Base64.html#URL_SAFE That refers to RFC3548
        Hide
        Yonik Seeley added a comment -

        We could use base64 as the standard for input/output in text protocols - that will only expand the size by 33% over binary.

        Show
        Yonik Seeley added a comment - We could use base64 as the standard for input/output in text protocols - that will only expand the size by 33% over binary.
        Hide
        Noble Paul added a comment -
        • mask the byte to get a +ve int
        • implement toExternal()
        Show
        Noble Paul added a comment - mask the byte to get a +ve int implement toExternal()
        Hide
        Noble Paul added a comment -
        • added a new field BinaryField
        • JavabinCodec writes ByteBuffer as byte[]
        • in text formats the binary data is written down as hex encoded bytes
        Show
        Noble Paul added a comment - added a new field BinaryField JavabinCodec writes ByteBuffer as byte[] in text formats the binary data is written down as hex encoded bytes

          People

          • Assignee:
            Noble Paul
            Reporter:
            Noble Paul
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development