Lucene - Core
  1. Lucene - Core
  2. LUCENE-2374

Add reflection API to AttributeSource/AttributeImpl

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current):

      • Iterator<Map.Entry<String,?>> AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. "term">"foobar","startOffset">Integer.valueOf(0),...
      • AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl

      No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator.

      I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor.

      1. shot4.png
        137 kB
        Uwe Schindler
      2. shot3.png
        134 kB
        Uwe Schindler
      3. shot2.png
        141 kB
        Uwe Schindler
      4. shot1.png
        163 kB
        Uwe Schindler
      5. LUCENE-2374-3x.patch
        18 kB
        Uwe Schindler
      6. LUCENE-2374-3x.patch
        33 kB
        Uwe Schindler
      7. LUCENE-2374-3x.patch
        52 kB
        Uwe Schindler
      8. LUCENE-2374-3x.patch
        62 kB
        Uwe Schindler
      9. LUCENE-2374-3x.patch
        79 kB
        Uwe Schindler
      10. LUCENE-2374-3x.patch
        81 kB
        Uwe Schindler
      11. LUCENE-2374.patch
        77 kB
        Uwe Schindler
      12. LUCENE-2374.patch
        78 kB
        Uwe Schindler
      13. LUCENE-2374.patch
        78 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          CharTermAtttribute is also in 3.1.

          Show
          Uwe Schindler added a comment - CharTermAtttribute is also in 3.1.
          Hide
          Robert Muir added a comment -

          What are your thoughts here Uwe?

          Should we fix for 3.1 or leave till 3.2?

          It does seem good if we could possibly fix for 3.1 due to the fact toString() was the only
          real introspection we had before?

          But at the same time, Solr's analysis.jsp etc doesn't introspect attributes outside of the
          original 6 in Token anyway, so maybe we could delay?

          Show
          Robert Muir added a comment - What are your thoughts here Uwe? Should we fix for 3.1 or leave till 3.2? It does seem good if we could possibly fix for 3.1 due to the fact toString() was the only real introspection we had before? But at the same time, Solr's analysis.jsp etc doesn't introspect attributes outside of the original 6 in Token anyway, so maybe we could delay?
          Hide
          Uwe Schindler added a comment -

          Oh, I forgot about that one! Wanted to fix this. I am currently away from my computer, can we discuss tomorrow?

          I really wanted to fix this, sorry.

          Show
          Uwe Schindler added a comment - Oh, I forgot about that one! Wanted to fix this. I am currently away from my computer, can we discuss tomorrow? I really wanted to fix this, sorry.
          Hide
          Uwe Schindler added a comment -

          I will work tomorrow on a patch with sophisticated backwards for 3.x and merge to trunk without sophisticated things g.

          I changed my mind a little bit: I would simplify the whole thing: AttributeSource and AttributeImpl should have a simple method:

          void addToMap(Map<String,?>)
          

          The default code would simply look like AttributeImpl.toString() now and add all non-static fields to the impl - special attributes like CharTermAttribute would have special handling. The default toString() [if not overridden] would simply call this method, too and pass a Fake-Map to the addToMap() method to populate a StringBuilder in Map.put() [to not automatically create a real Map always).

          Code in analysis.jsp would simply pass a map (e.g. a LinkedHashMap to preserve order) and then print it out. The values in the map are native types / or CharSequence for (Char)TermAttribute.

          The only problem with this aproach would be that the attribute keys must be unique - an idea would be to prefix them with the attribute name.

          I will also remove the abstract equals() and hashCode() in AttributeImpl - the attribute API does not rely on them to be implemented - this simplyfies implementation of attributes, because equals/hashCode are never used. In 4.0 I would remove this from all custom attributes.

          What do you think?

          Show
          Uwe Schindler added a comment - I will work tomorrow on a patch with sophisticated backwards for 3.x and merge to trunk without sophisticated things g . I changed my mind a little bit: I would simplify the whole thing: AttributeSource and AttributeImpl should have a simple method: void addToMap(Map< String ,?>) The default code would simply look like AttributeImpl.toString() now and add all non-static fields to the impl - special attributes like CharTermAttribute would have special handling. The default toString() [if not overridden] would simply call this method, too and pass a Fake-Map to the addToMap() method to populate a StringBuilder in Map.put() [to not automatically create a real Map always). Code in analysis.jsp would simply pass a map (e.g. a LinkedHashMap to preserve order) and then print it out. The values in the map are native types / or CharSequence for (Char)TermAttribute. The only problem with this aproach would be that the attribute keys must be unique - an idea would be to prefix them with the attribute name. I will also remove the abstract equals() and hashCode() in AttributeImpl - the attribute API does not rely on them to be implemented - this simplyfies implementation of attributes, because equals/hashCode are never used. In 4.0 I would remove this from all custom attributes. What do you think?
          Hide
          Earwin Burrfoot added a comment -

          Nice. Except maybe introduce a simple interface instead of the Map<String, ?> ?

          interface AttributeReflector { // Name is crap, should be changed
            void reflect(String key, Object value);
          }
          
          void reflectWith(AttributeReflector reflector);
          

          You have no need for fake maps then, both in toString(), and in user code.

          Show
          Earwin Burrfoot added a comment - Nice. Except maybe introduce a simple interface instead of the Map<String, ?> ? interface AttributeReflector { // Name is crap, should be changed void reflect( String key, Object value); } void reflectWith(AttributeReflector reflector); You have no need for fake maps then, both in toString(), and in user code.
          Hide
          Uwe Schindler added a comment -

          I had the same idea and I am coding it currently.

          Show
          Uwe Schindler added a comment - I had the same idea and I am coding it currently.
          Hide
          Earwin Burrfoot added a comment -

          Another step in the same direction then. Instead of

          The only problem with this aproach would be that the attribute keys must be unique - an idea would be to prefix them with the attribute name.

          Let us define interface as - void reflect(Class<?> attributeClass, String key, Object value) ?
          If the client code then wants to call toString() on attributeClass and concat with key - it's free to do so. If it wants to be more creative - it can.

          Show
          Earwin Burrfoot added a comment - Another step in the same direction then. Instead of The only problem with this aproach would be that the attribute keys must be unique - an idea would be to prefix them with the attribute name. Let us define interface as - void reflect(Class<?> attributeClass, String key, Object value) ? If the client code then wants to call toString() on attributeClass and concat with key - it's free to do so. If it wants to be more creative - it can.
          Hide
          Uwe Schindler added a comment -

          Here a first patch with the proposed API (thanks Earwin).

          The patch is for 3.x, as it contains already the sophisticated(TM) backwards compatibility layer (see javadocs).

          Still missing:

          • Remove obsolete toString in contrib/queryparser
          • Test for sophisticated bw
          • Tests for API in general
          • an AttributeChecker test class that checks basic Attribute features and its implementation (copyTo, reflectAsString,...)
          • Solr changes to make use of this API in analysis.jsp and the other TokenStream component

          What do you think?

          Show
          Uwe Schindler added a comment - Here a first patch with the proposed API (thanks Earwin). The patch is for 3.x, as it contains already the sophisticated(TM) backwards compatibility layer (see javadocs). Still missing: Remove obsolete toString in contrib/queryparser Test for sophisticated bw Tests for API in general an AttributeChecker test class that checks basic Attribute features and its implementation (copyTo, reflectAsString,...) Solr changes to make use of this API in analysis.jsp and the other TokenStream component What do you think?
          Hide
          Uwe Schindler added a comment -

          New patch with analysis.jsp fixed (also SOLR-2315):

          • highlighting works again
          • only attributes are shown that exist at the step of analysis
          • attribute names changed a little bit, because it uses the ones from reflection api
          • Attribute class name shown in mouse hover
          • start/endOffset now in two different lines (no chance to do it other without another special case)
          • payloads are no longer printed as text, because it used default platform encoding (new String(byte[]))!

          I also added some example screenshots!

          Show
          Uwe Schindler added a comment - New patch with analysis.jsp fixed (also SOLR-2315 ): highlighting works again only attributes are shown that exist at the step of analysis attribute names changed a little bit, because it uses the ones from reflection api Attribute class name shown in mouse hover start/endOffset now in two different lines (no chance to do it other without another special case) payloads are no longer printed as text, because it used default platform encoding (new String(byte[]))! I also added some example screenshots!
          Hide
          Robert Muir added a comment -

          +1, this looks great.

          i think its really important to show all the attributes in analysis.jsp, e.g. KeywordAttribute.

          Show
          Robert Muir added a comment - +1, this looks great. i think its really important to show all the attributes in analysis.jsp, e.g. KeywordAttribute.
          Hide
          Simon Willnauer added a comment -

          nice work uwe!! +1

          Show
          Simon Willnauer added a comment - nice work uwe!! +1
          Hide
          Uwe Schindler added a comment -

          Next patch - Converted deprecated AnalysisRequestHandler and (Field|Document)AnalysisRequestHandler:

          • Token are now correctly ordered by position not endOffset (this is identical to analysis.jsp)
          • Moved over to AttributeSource
          • support for custom attributes
          • fixed payloads to display as hex like in analysis.jsp (maybe this should be fixed in future to support Payload instances directly in ResponseWriter)

          I will now concentrate on contrib/queryparser, Lucene reflection tests and javadocs in Lucene core

          Show
          Uwe Schindler added a comment - Next patch - Converted deprecated AnalysisRequestHandler and (Field|Document)AnalysisRequestHandler: Token are now correctly ordered by position not endOffset (this is identical to analysis.jsp) Moved over to AttributeSource support for custom attributes fixed payloads to display as hex like in analysis.jsp (maybe this should be fixed in future to support Payload instances directly in ResponseWriter) I will now concentrate on contrib/queryparser, Lucene reflection tests and javadocs in Lucene core
          Hide
          Uwe Schindler added a comment -

          In my opinion, there is lots of code duplication in unmainainable analysis.jsp. I think we should open a new issue to remove it and replace by an XSL or alternatively make its internal functionality backed by FieldAnalysisReuqestHandler.

          Show
          Uwe Schindler added a comment - In my opinion, there is lots of code duplication in unmainainable analysis.jsp. I think we should open a new issue to remove it and replace by an XSL or alternatively make its internal functionality backed by FieldAnalysisReuqestHandler.
          Hide
          Mark Miller added a comment -

          Agreed Uwe.

          Show
          Mark Miller added a comment - Agreed Uwe.
          Hide
          Uwe Schindler added a comment -

          Final patch for 3.x:

          • Added some tests for reflection API
          • Added test for sophisticated backwards layer

          I will commit this tomorrow and forward-port to trunk.

          Contrib/queryparsers attributes are not yet compatible with reflection, as the toString() there has wrong format (throws UOE). I will open a separate issue to fix that, this is not a serious problem at the moment, as attribute reflection is not needed there.

          Show
          Uwe Schindler added a comment - Final patch for 3.x: Added some tests for reflection API Added test for sophisticated backwards layer I will commit this tomorrow and forward-port to trunk. Contrib/queryparsers attributes are not yet compatible with reflection, as the toString() there has wrong format (throws UOE). I will open a separate issue to fix that, this is not a serious problem at the moment, as attribute reflection is not needed there.
          Hide
          Uwe Schindler added a comment -

          Also converted contrib/queryparser. The problematic thing was that these attributes use an xml-like toString() and so the backwards layer is enabled. So I simply added the reflectWith() method to not use reflection like for toString. But for consistence, in trunk, we should remove toString() and reflectWith().

          Will commit tomorrow, as said before.

          Show
          Uwe Schindler added a comment - Also converted contrib/queryparser. The problematic thing was that these attributes use an xml-like toString() and so the backwards layer is enabled. So I simply added the reflectWith() method to not use reflection like for toString. But for consistence, in trunk, we should remove toString() and reflectWith(). Will commit tomorrow, as said before.
          Hide
          Uwe Schindler added a comment -

          Minor upgrades in backwards compatibility layer and javadocs.
          Will commit this now.

          Show
          Uwe Schindler added a comment - Minor upgrades in backwards compatibility layer and javadocs. Will commit this now.
          Hide
          Uwe Schindler added a comment -

          Committed 3.x revision: 1060784

          Will now port to trunk without backwards layer.

          Show
          Uwe Schindler added a comment - Committed 3.x revision: 1060784 Will now port to trunk without backwards layer.
          Hide
          Uwe Schindler added a comment - - edited

          First preview of trunk patch:

          • Removed useless equals() and hashCode() from internal attributes
          • Removed backwards layer from previous patch / contrib-qp needs no special handling anymore

          Solr:

          • Removed deprecated AnalysisRequestHandler instead of fixing it - still test need to be cleaned up
          • (Field|Document)AnalysisRequestHandler now works primarily on BytesRef, especially for highlighting
          • Using BytesRef to wrap Payloads to easy print them as hex using BytesRef.toString()
          • There is still one analysis test failing: as NumericTermAttribute does not support cloning, analysis does not work and test fails. I will now open another issue to fix this (make NumericTermAttribute decoupled from outer TokenStream).
          Show
          Uwe Schindler added a comment - - edited First preview of trunk patch: Removed useless equals() and hashCode() from internal attributes Removed backwards layer from previous patch / contrib-qp needs no special handling anymore Solr: Removed deprecated AnalysisRequestHandler instead of fixing it - still test need to be cleaned up (Field|Document)AnalysisRequestHandler now works primarily on BytesRef, especially for highlighting Using BytesRef to wrap Payloads to easy print them as hex using BytesRef.toString() There is still one analysis test failing: as NumericTermAttribute does not support cloning, analysis does not work and test fails. I will now open another issue to fix this (make NumericTermAttribute decoupled from outer TokenStream).
          Hide
          Uwe Schindler added a comment -

          Further TODOs:

          • User TermsHash to collect the match set in AnalysisRequestHandlerBase
          Show
          Uwe Schindler added a comment - Further TODOs: User TermsHash to collect the match set in AnalysisRequestHandlerBase
          Hide
          Uwe Schindler added a comment -

          Hopefully final patch:

          • minor changes for resolved NumericTermAttribute clone issue (LUCENE-2875)
          • removed leftover AnalysisRequestHandler in clustering tests

          I will commit this tomorrow if no-one objects!

          Show
          Uwe Schindler added a comment - Hopefully final patch: minor changes for resolved NumericTermAttribute clone issue ( LUCENE-2875 ) removed leftover AnalysisRequestHandler in clustering tests I will commit this tomorrow if no-one objects!
          Hide
          Uwe Schindler added a comment -

          Final trunk patch, will commit now.

          Show
          Uwe Schindler added a comment - Final trunk patch, will commit now.
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1061039

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1061039
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development