Solr
  1. Solr
  2. SOLR-807

UUIDField type cannot be recognized when wt=javabin is used

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: clients - java
    • Labels:
      None

      Description

      I'm using UUID via Solrj in my project. When I use javabin (default), I got:

      java.util.UUID: 391e3214-4f8e-4abd-aa6b-4f12be79534f

      as the uuid value. But if I use xml, I got:

      391e3214-4f8e-4abd-aa6b-4f12be79534f

      I think the both of them should return same string.

      program for reproducing the problem:

        public static void main(String[] args) throws Exception {
          CommonsHttpSolrServer server = new CommonsHttpSolrServer( "http://localhost:8983/solr" );
          SolrQuery query = new SolrQuery().setQuery( "*:*" );
          //server.setParser( new XMLResponseParser() );   // uncomment for wt=xml
          System.out.println( "===== " + server.getParser().getClass().getSimpleName() + " =====" );
          QueryResponse rsp = server.query( query );
          SolrDocumentList docs = rsp.getResults();
          for( SolrDocument doc : docs ){
            Object id = doc.getFieldValue( "id" );
            System.out.println( "type = " + id.getClass().getName() + ", id = " + id );
            Object timestamp = doc.getFieldValue( "timestamp" );
            System.out.println( "type = " + timestamp.getClass().getName() + ", timestamp = " + timestamp );
          }
        }
      

      result for wt=javabin

      javabin
      ===== BinaryResponseParser =====
      type = java.lang.String, id = java.util.UUID:391e3214-4f8e-4abd-aa6b-4f12be79534f
      type = java.util.Date, timestamp = Wed Oct 15 00:20:50 JST 2008
      

      result for wt=xml

      xml
      ===== XMLResponseParser =====
      type = java.lang.String, id = 391e3214-4f8e-4abd-aa6b-4f12be79534f
      type = java.util.Date, timestamp = Wed Oct 15 00:20:50 JST 2008
      
      1. SOLR-807.patch
        10 kB
        Shalin Shekhar Mangar
      2. SOLR-807.patch
        7 kB
        Shalin Shekhar Mangar
      3. SOLR-807.patch
        3 kB
        Noble Paul
      4. SOLR-807.patch
        0.6 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          Noble Paul added a comment -

          is java.util.UUID a supported type in Lucene?

          Then let us follow what XML format is doing.

          Show
          Noble Paul added a comment - is java.util.UUID a supported type in Lucene? Then let us follow what XML format is doing.
          Hide
          Koji Sekiguchi added a comment -

          is java.util.UUID a supported type in Lucene?

          No. Date is also NOT a supported type in Lucene, but it seems that Date is recognized in BinaryResponseWriter...

          Show
          Koji Sekiguchi added a comment - is java.util.UUID a supported type in Lucene? No. Date is also NOT a supported type in Lucene, but it seems that Date is recognized in BinaryResponseWriter...
          Hide
          Noble Paul added a comment -

          I guess there is a org.apache.solr.schema.DateField which produces java.util.Date type.

          All other types are written down as

          val.getClass().getName() + ':' + val.toString()
          

          Now I realize that there is a org.apache.solr.schema.UUIDField also

          Should we make an Exception for UUID or just change the code to omit the val.getClass().getName()

          I guess that is better.

          Show
          Noble Paul added a comment - I guess there is a org.apache.solr.schema.DateField which produces java.util.Date type. All other types are written down as val.getClass().getName() + ':' + val.toString() Now I realize that there is a org.apache.solr.schema.UUIDField also Should we make an Exception for UUID or just change the code to omit the val.getClass().getName() I guess that is better.
          Hide
          Hoss Man added a comment -

          Change which code?

          both BinaryResponseWriter+NamedListCodec and the XMLWriter classes use "val.getClass().getName() + ':' + val.toString()" to outbut objects on on the list of legal things that can be in a SolrQueryResponse...

          http://lucene.apache.org/solr/api/org/apache/solr/request/SolrQueryResponse.html

          I'm not very familiar with the way either SolrJ or the BinaryResponseWriter/NamedListCodec stuff works, but I don't think this is relaly about how either ResponseWriter deals with "objects" added directly to the response ... it seems to relate specifically to how BinaryResponseWriter deals with writing field values of Documents ... the text/xml based writers use FieldType.write(...) and let the FieldType decide how to best render itself, while the NameListCodec seems to make it's own decisions based on FieldType.toObject (although i'm not sure where toObject is getting called – but that's the only reason i can think of why you would encounter a java.util.UUID object directly in a SolrDocument)

          This doesn't seem like a SolrJ in so much as a larger issue with the way the NamedListCodec works ... if it's not going to delegate to the FieldType to decide how to "write" a non standard object, then shouldn't it at least recognize and automatically extract that class type prefix info when it "reads" a non standard object?

          Show
          Hoss Man added a comment - Change which code? both BinaryResponseWriter+NamedListCodec and the XMLWriter classes use "val.getClass().getName() + ':' + val.toString()" to outbut objects on on the list of legal things that can be in a SolrQueryResponse... http://lucene.apache.org/solr/api/org/apache/solr/request/SolrQueryResponse.html I'm not very familiar with the way either SolrJ or the BinaryResponseWriter/NamedListCodec stuff works, but I don't think this is relaly about how either ResponseWriter deals with "objects" added directly to the response ... it seems to relate specifically to how BinaryResponseWriter deals with writing field values of Documents ... the text/xml based writers use FieldType.write(...) and let the FieldType decide how to best render itself, while the NameListCodec seems to make it's own decisions based on FieldType.toObject (although i'm not sure where toObject is getting called – but that's the only reason i can think of why you would encounter a java.util.UUID object directly in a SolrDocument) This doesn't seem like a SolrJ in so much as a larger issue with the way the NamedListCodec works ... if it's not going to delegate to the FieldType to decide how to "write" a non standard object, then shouldn't it at least recognize and automatically extract that class type prefix info when it "reads" a non standard object?
          Hide
          Noble Paul added a comment -

          Change which code?

          NamedListCodec

          while the NameListCodec seems to make it's own decisions based on FieldType.toObject..

          you are right. NamedListCodec needs to handle Objects differently because , that is the only way it can write out values efficiently. There are two ways to address the issue.

          • Make UUID a known type . I mean add a new type to NamedListCodec. This means bumping up the version of javabin format say version=2. I guess we should not make this change unless we wish to make some more important changes.
          • Write down UUID as a String and treat it like String (of course w/o the classname). XMLWriter is anyway doing this . This change will be backward compatible and we may not need to bump up the version.
          Show
          Noble Paul added a comment - Change which code? NamedListCodec while the NameListCodec seems to make it's own decisions based on FieldType.toObject.. you are right. NamedListCodec needs to handle Objects differently because , that is the only way it can write out values efficiently. There are two ways to address the issue. Make UUID a known type . I mean add a new type to NamedListCodec. This means bumping up the version of javabin format say version=2. I guess we should not make this change unless we wish to make some more important changes. Write down UUID as a String and treat it like String (of course w/o the classname). XMLWriter is anyway doing this . This change will be backward compatible and we may not need to bump up the version.
          Hide
          Noble Paul added a comment -

          write UUID as a String

          Show
          Noble Paul added a comment - write UUID as a String
          Hide
          Hoss Man added a comment -

          Make UUID a known type . I mean add a new type to NamedListCodec. This means bumping up the version of javabin format say version=2. I guess we should not make this change unless we wish to make some more important changes.

          but the crux of the issue isn't specific to UUIDField ... any custom FieldType people might have is going to encounter this same problem.

          If a RequestHandler adds an arbitrary object (not on the legal list) directly to the response, then the ResponseWriter is certainly at liberty to output that however it makes sense – for both XmlResponseWriter and BinaryResponseWriter that's going to result in a java.util.UUID object, or a chrish.custom.MyBean object, named 'val' being translated into a string via...

          val.getClass().getName() + ':' + val.toString()

          ...that's fine. But the problem seems to be specifically related to how the BinaryResponseWriter deals with writing out field values of Documents – the FieldType should be used to decide how to render field values.

          If we want to have optimized code in the NamedListCodec for efficiently dealing with the really commons types that's great, but that fallback case, for dealing with FieldTypes we haven't optimized – that needs to result in the client getting some object which isn't just val.getClass().getName() + ':' + val.toString() ... at a minimum it should be val.toString() , but ideally the FieldType should be able to control of how clients ultimately get the value of that field. (They don't necessarily need to be able to control how it should go over the wire, but they should be able to control what the end result is)

          Let's fix the problem (BinaryResponseWriter and field values of arbitrary FieldTypes) and not the symptom (UUIDFIeld)

          How exactly we do this ... i'm not sure. Based on my limited understanding of the existing code my rough suggestions...

          • NamedListCodec could have pass a special TextResponseWriter subclass to the FieldType's write(TextResponseWriter,...) method, and that TextResponseWriter could take the same special action NamedListCodec currently takes to efficiently serialize all of the datatypes in the varies write*(..) methods
          • NamedListCodec maintains it's current list of efficient encodings for common types, but uses FieldType.toExternal() to generate a String to send over the wire (instead of using toObject) in non-common cases.
          • we add a new "toSimpleObject(Fieldable)" method to FieldType, which would be documented to say that it must return an object of the "legal" types for a response ... the default implementation would call toObject(Fieldable), test the result with a few instanceOf checks and return val.toString() if it doesn't pass any of them. and start using this method instead of toObject(Fieldable) when dealing with DocLists.

          thoughts?

          Show
          Hoss Man added a comment - Make UUID a known type . I mean add a new type to NamedListCodec. This means bumping up the version of javabin format say version=2. I guess we should not make this change unless we wish to make some more important changes. but the crux of the issue isn't specific to UUIDField ... any custom FieldType people might have is going to encounter this same problem. If a RequestHandler adds an arbitrary object (not on the legal list) directly to the response, then the ResponseWriter is certainly at liberty to output that however it makes sense – for both XmlResponseWriter and BinaryResponseWriter that's going to result in a java.util.UUID object, or a chrish.custom.MyBean object, named 'val' being translated into a string via... val.getClass().getName() + ':' + val.toString() ...that's fine. But the problem seems to be specifically related to how the BinaryResponseWriter deals with writing out field values of Documents – the FieldType should be used to decide how to render field values. If we want to have optimized code in the NamedListCodec for efficiently dealing with the really commons types that's great, but that fallback case, for dealing with FieldTypes we haven't optimized – that needs to result in the client getting some object which isn't just val.getClass().getName() + ':' + val.toString() ... at a minimum it should be val.toString() , but ideally the FieldType should be able to control of how clients ultimately get the value of that field. (They don't necessarily need to be able to control how it should go over the wire, but they should be able to control what the end result is) Let's fix the problem (BinaryResponseWriter and field values of arbitrary FieldTypes) and not the symptom (UUIDFIeld) How exactly we do this ... i'm not sure. Based on my limited understanding of the existing code my rough suggestions... NamedListCodec could have pass a special TextResponseWriter subclass to the FieldType's write(TextResponseWriter,...) method, and that TextResponseWriter could take the same special action NamedListCodec currently takes to efficiently serialize all of the datatypes in the varies write*(..) methods NamedListCodec maintains it's current list of efficient encodings for common types, but uses FieldType.toExternal() to generate a String to send over the wire (instead of using toObject) in non-common cases. we add a new " toSimpleObject(Fieldable) " method to FieldType, which would be documented to say that it must return an object of the "legal" types for a response ... the default implementation would call toObject(Fieldable), test the result with a few instanceOf checks and return val.toString() if it doesn't pass any of them. and start using this method instead of toObject(Fieldable) when dealing with DocLists. thoughts?
          Hide
          Noble Paul added a comment -

          NamedListCodec could have pass a special TextResponseWriter...

          Not a godd idea. On the client side it may not know how to deserialize it. Moreover , Binary format is very fragile and it can break the output because each bit is significant. Every byte that is written out must use NamedListCodec.

          NamedListCodec maintains it's current list of efficient encodings for common types, but uses FieldType.toExternal() to generate a String to send over the wire (instead of using toObject) in non-common cases.

          make sense.

          we add a new "toSimpleObject(Fieldable)" method to FieldType,

          Problem is not with writing out. How to deserialize it at the other end of the pipe.

          String representation is the most convenient form for everyone. Even with xml we write out everything as a String <str></str>, just that the the content format of the string type is decided by the FieldType. If toString() is implemented properly by the Object it should be fine.

          We may not be able to support all the user defined types. But , we must be able to support all the types which we know of.

          Show
          Noble Paul added a comment - NamedListCodec could have pass a special TextResponseWriter... Not a godd idea. On the client side it may not know how to deserialize it. Moreover , Binary format is very fragile and it can break the output because each bit is significant. Every byte that is written out must use NamedListCodec. NamedListCodec maintains it's current list of efficient encodings for common types, but uses FieldType.toExternal() to generate a String to send over the wire (instead of using toObject) in non-common cases. make sense. we add a new "toSimpleObject(Fieldable)" method to FieldType, Problem is not with writing out. How to deserialize it at the other end of the pipe. String representation is the most convenient form for everyone. Even with xml we write out everything as a String <str></str>, just that the the content format of the string type is decided by the FieldType. If toString() is implemented properly by the Object it should be fine. We may not be able to support all the user defined types. But , we must be able to support all the types which we know of.
          Hide
          Noble Paul added a comment -

          while writing out a DocList resolve any unknown Object to String

          Show
          Noble Paul added a comment - while writing out a DocList resolve any unknown Object to String
          Hide
          Shalin Shekhar Mangar added a comment -
          1. Added a set of FieldTypes supported by NamedListCodec (the ones that we know how to read).
          2. BinaryResponseWriter writes fieldType.toExternal if it is not a supported type, otherwise it writes fieldType.toObject

          Note – UUID is not a supported type so it is written as toExternal.

          Need to add a test for this.

          Show
          Shalin Shekhar Mangar added a comment - Added a set of FieldTypes supported by NamedListCodec (the ones that we know how to read). BinaryResponseWriter writes fieldType.toExternal if it is not a supported type, otherwise it writes fieldType.toObject Note – UUID is not a supported type so it is written as toExternal. Need to add a test for this.
          Hide
          Shalin Shekhar Mangar added a comment -

          Updated with a test to verify the encoding/decoding of UUID with the changes to BinaryResponseWriter. All tests pass.

          With this change, we make a lookup on a HashSet for each field to be written out. Someone who knows more about the javabin format should take a look. Yonik or Noble?

          Show
          Shalin Shekhar Mangar added a comment - Updated with a test to verify the encoding/decoding of UUID with the changes to BinaryResponseWriter. All tests pass. With this change, we make a lookup on a HashSet for each field to be written out. Someone who knows more about the javabin format should take a look. Yonik or Noble?
          Hide
          Noble Paul added a comment -

          This does not impact the 'javabin' format in any way. Just that the objects passed to the NamedListCodec will be different if they are not 'supported'

          There is a small overhead of a HashSet lookup for each object . (which is trivial I guess)

          Show
          Noble Paul added a comment - This does not impact the 'javabin' format in any way. Just that the objects passed to the NamedListCodec will be different if they are not 'supported' There is a small overhead of a HashSet lookup for each object . (which is trivial I guess)
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 726502.

          Thanks Koji and Noble!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 726502. Thanks Koji and Noble!
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Koji Sekiguchi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development