Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Its currently just Object.toString, e.g.:

      org.apache.lucene.analysis.en.PorterStemFilter@8a32165c

      But I think we should make it more useful, to end users trying to see what their chain is doing, and to make SOPs easier when debugging:

      EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
      try (TokenStream ts = analyzer.tokenStream("body", "Its 2013, let's fix this already!")) {
        ts.reset();
        while (ts.incrementToken()) {
          System.out.println(ts.toString());
        }
        ts.end();
      }
      

      Proposed output:

      PorterStemFilter@8a32165c term=it,bytes=[69 74],startOffset=0,endOffset=3,positionIncrement=1,type=<ALPHANUM>,keyword=false
      PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 33],startOffset=4,endOffset=8,positionIncrement=1,type=<NUM>,keyword=false
      PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 74],startOffset=10,endOffset=15,positionIncrement=1,type=<ALPHANUM>,keyword=false
      PorterStemFilter@45cbde1b term=fix,bytes=[66 69 78],startOffset=16,endOffset=19,positionIncrement=1,type=<ALPHANUM>,keyword=false
      PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 69],startOffset=25,endOffset=32,positionIncrement=2,type=<ALPHANUM>,keyword=false
      
      1. LUCENE-5275.patch
        0.5 kB
        Robert Muir
      2. LUCENE-5275.patch
        0.5 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        And maybe hashcode should be replaced by identityhashcode?

        Show
        Robert Muir added a comment - And maybe hashcode should be replaced by identityhashcode?
        Hide
        Robert Muir added a comment -

        Yeah: i think identityhashcode is way more useful (see updated patch).

        This way the toString actually helps you figure out which guy is which if you are putting sops in your analysis chain:

        PorterStemFilter@23bd31c0 term=it,bytes=[69 74],startOffset=0,endOffset=3,positionIncrement=1,type=<ALPHANUM>,keyword=false
        PorterStemFilter@23bd31c0 term=2013,bytes=[32 30 31 33],startOffset=4,endOffset=8,positionIncrement=1,type=<NUM>,keyword=false
        PorterStemFilter@23bd31c0 term=let,bytes=[6c 65 74],startOffset=10,endOffset=15,positionIncrement=1,type=<ALPHANUM>,keyword=false
        PorterStemFilter@23bd31c0 term=fix,bytes=[66 69 78],startOffset=16,endOffset=19,positionIncrement=1,type=<ALPHANUM>,keyword=false
        PorterStemFilter@23bd31c0 term=alreadi,bytes=[61 6c 72 65 61 64 69],startOffset=25,endOffset=32,positionIncrement=2,type=<ALPHANUM>,keyword=false
        
        Show
        Robert Muir added a comment - Yeah: i think identityhashcode is way more useful (see updated patch). This way the toString actually helps you figure out which guy is which if you are putting sops in your analysis chain: PorterStemFilter@23bd31c0 term=it,bytes=[69 74],startOffset=0,endOffset=3,positionIncrement=1,type=<ALPHANUM>,keyword=false PorterStemFilter@23bd31c0 term=2013,bytes=[32 30 31 33],startOffset=4,endOffset=8,positionIncrement=1,type=<NUM>,keyword=false PorterStemFilter@23bd31c0 term=let,bytes=[6c 65 74],startOffset=10,endOffset=15,positionIncrement=1,type=<ALPHANUM>,keyword=false PorterStemFilter@23bd31c0 term=fix,bytes=[66 69 78],startOffset=16,endOffset=19,positionIncrement=1,type=<ALPHANUM>,keyword=false PorterStemFilter@23bd31c0 term=alreadi,bytes=[61 6c 72 65 61 64 69],startOffset=25,endOffset=32,positionIncrement=2,type=<ALPHANUM>,keyword=false
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Adrien Grand added a comment -

        +1

        Show
        Adrien Grand added a comment - +1
        Hide
        Uwe Schindler added a comment -

        OK, this is alright. reflectAsString and the hashcode is fine.

        I don't want users to "parse" this output, this was a major pain and so we removed toString() in Lucene 3.1 and replaced by reflectAsString().

        Show
        Uwe Schindler added a comment - OK, this is alright. reflectAsString and the hashcode is fine. I don't want users to "parse" this output, this was a major pain and so we removed toString() in Lucene 3.1 and replaced by reflectAsString().
        Hide
        Robert Muir added a comment -

        I think its ok for Token.toString() to keep its current implementation? that was a concern too right?

        I dont care about Token

        Show
        Robert Muir added a comment - I think its ok for Token.toString() to keep its current implementation? that was a concern too right? I dont care about Token
        Hide
        Uwe Schindler added a comment -

        I am fine, as already said

        One thing. Can we make toString(), equals() and hashCode() final in TokenStream to prevent users to do shitty stuff on subclasses?

        Show
        Uwe Schindler added a comment - I am fine, as already said One thing. Can we make toString(), equals() and hashCode() final in TokenStream to prevent users to do shitty stuff on subclasses?
        Hide
        Uwe Schindler added a comment -

        Another way to fix this then (because super.toString() is already using Object.toString():

        public final String toString() {
          return super.toString() + " " + reflectAsString(false);
        }
        
        Show
        Uwe Schindler added a comment - Another way to fix this then (because super.toString() is already using Object.toString(): public final String toString() { return super .toString() + " " + reflectAsString( false ); }
        Hide
        Robert Muir added a comment -

        I didnt want the full class name. i think, while thats appropriate for Object.toString, its overkill and overly verbose in this context. I like my getSimpleName modification better personally.

        Show
        Robert Muir added a comment - I didnt want the full class name. i think, while thats appropriate for Object.toString, its overkill and overly verbose in this context. I like my getSimpleName modification better personally.
        Hide
        Robert Muir added a comment -

        And as i mentioned, Object.toString prints the hashcode (which is useless and changes according to the contents). IdentityHashCode i feel is more useful as part of the toString message: because it helps to identify which stream in the chain it is (imagine you are toStringing along each way of the chain in debugging and have the same filter in the chain twice).

        Show
        Robert Muir added a comment - And as i mentioned, Object.toString prints the hashcode (which is useless and changes according to the contents). IdentityHashCode i feel is more useful as part of the toString message: because it helps to identify which stream in the chain it is (imagine you are toStringing along each way of the chain in debugging and have the same filter in the chain twice).
        Hide
        Uwe Schindler added a comment -

        The hasCode of TokenStream is the default hashCode of Object, which is identityHashCode(). This is why I said: make toString(), equals(), hashCode() final in TokenStream, because it makes absolutely no sense to compare tokenstreams. identityHashCode is defined as: "Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero."

        Show
        Uwe Schindler added a comment - The hasCode of TokenStream is the default hashCode of Object, which is identityHashCode(). This is why I said: make toString(), equals(), hashCode() final in TokenStream, because it makes absolutely no sense to compare tokenstreams. identityHashCode is defined as: "Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero."
        Hide
        Robert Muir added a comment -

        One thing. Can we make toString(), equals() and hashCode() final in TokenStream to prevent users to do shitty stuff on subclasses?

        I don't see why we should make the proposed toString final.

        My suggested implementation tries to be succinct and useful by printing just the simple class name + identityhashcode + reflectAsString(false).

        If someone wants to override this method to print "more stuff", e.g. reflectAsString(true), or some other internal state variables that matter to them, let them?

        Show
        Robert Muir added a comment - One thing. Can we make toString(), equals() and hashCode() final in TokenStream to prevent users to do shitty stuff on subclasses? I don't see why we should make the proposed toString final. My suggested implementation tries to be succinct and useful by printing just the simple class name + identityhashcode + reflectAsString(false). If someone wants to override this method to print "more stuff", e.g. reflectAsString(true), or some other internal state variables that matter to them, let them?
        Hide
        Uwe Schindler added a comment -

        Or shorter: The default Object hashCode does not change on Object's contents because it supports the default identity equals(). Identity equals does not compare contents, so hashCode is stable.

        Show
        Uwe Schindler added a comment - Or shorter: The default Object hashCode does not change on Object's contents because it supports the default identity equals(). Identity equals does not compare contents, so hashCode is stable.
        Hide
        Robert Muir added a comment -

        The hasCode of TokenStream is the default hashCode of Object, which is identityHashCode(). This is why I said: make toString(), equals(), hashCode() final in TokenStream, because it makes absolutely no sense to compare tokenstreams. identityHashCode is defined as: "Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero."

        Can we please discuss hashcode/equals on another issue?

        This issue is 100% about toString: hashcode/equals has a separate purpose and deserves its own issue / analysis of the tradeoffs. I'm not proposing changing anything about them here.

        Show
        Robert Muir added a comment - The hasCode of TokenStream is the default hashCode of Object, which is identityHashCode(). This is why I said: make toString(), equals(), hashCode() final in TokenStream, because it makes absolutely no sense to compare tokenstreams. identityHashCode is defined as: "Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero." Can we please discuss hashcode/equals on another issue? This issue is 100% about toString: hashcode/equals has a separate purpose and deserves its own issue / analysis of the tradeoffs. I'm not proposing changing anything about them here.
        Hide
        Uwe Schindler added a comment -

        I don't see why we should make the proposed toString final.

        That's fine. I just want prevent people from putting TokenStreams into maps/sets, so make equals() and hashCode() final (implementing identity contract). I don't care about toString().

        Show
        Uwe Schindler added a comment - I don't see why we should make the proposed toString final. That's fine. I just want prevent people from putting TokenStreams into maps/sets, so make equals() and hashCode() final (implementing identity contract). I don't care about toString().
        Hide
        Uwe Schindler added a comment -

        You started the hashCode discussion, so its part of the game

        Show
        Uwe Schindler added a comment - You started the hashCode discussion, so its part of the game
        Hide
        Robert Muir added a comment -

        I did not start a discussion about hashcodes.

        I started a discussion about the "@[hexnumber]" that appears in tostring. it happens to be the hashcode by default.

        I refuse to change the hashcode on this issue.

        Show
        Robert Muir added a comment - I did not start a discussion about hashcodes. I started a discussion about the "@ [hexnumber] " that appears in tostring. it happens to be the hashcode by default. I refuse to change the hashcode on this issue.
        Hide
        Uwe Schindler added a comment -

        I only want you to use the default hashcode and not identity Hash Code because it is the same of hashcode stays not implemented. The reason for this was given before.

        That's all

        Show
        Uwe Schindler added a comment - I only want you to use the default hashcode and not identity Hash Code because it is the same of hashcode stays not implemented. The reason for this was given before. That's all
        Hide
        Robert Muir added a comment -

        That's fine. I just want prevent people from putting TokenStreams into maps/sets, so make equals() and hashCode() final (implementing identity contract). I don't care about toString().

        So can you make another issue about that please? This issue is just about toString. I don't want to go messing with equals/hashcode (high risk of bugs).

        This issue is about improving toString to make debugging easier!

        Show
        Robert Muir added a comment - That's fine. I just want prevent people from putting TokenStreams into maps/sets, so make equals() and hashCode() final (implementing identity contract). I don't care about toString(). So can you make another issue about that please? This issue is just about toString. I don't want to go messing with equals/hashcode (high risk of bugs). This issue is about improving toString to make debugging easier!
        Hide
        Uwe Schindler added a comment -

        Ok, other issue... I just give up, but its not my final decision!

        Show
        Uwe Schindler added a comment - Ok, other issue... I just give up, but its not my final decision!
        Hide
        Robert Muir added a comment -

        seriously, if you want to improve hashcode/equals: I am all for it.

        I just dont want that mixed into this issue. I want to keep it completely contained to improving toString to make debugging easier.

        Show
        Robert Muir added a comment - seriously, if you want to improve hashcode/equals: I am all for it. I just dont want that mixed into this issue. I want to keep it completely contained to improving toString to make debugging easier.
        Hide
        Uwe Schindler added a comment -

        Ok. My only suggestion: See above!

        Show
        Uwe Schindler added a comment - Ok. My only suggestion: See above!
        Hide
        Robert Muir added a comment -

        But your suggestion is plain wrong:

        The hashCode of TokenStream is the default hashCode of Object, which is identityHashCode().

        this is not true! Its the hashCode of AttributeSource.

        This is useless.

        Show
        Robert Muir added a comment - But your suggestion is plain wrong: The hashCode of TokenStream is the default hashCode of Object, which is identityHashCode(). this is not true! Its the hashCode of AttributeSource. This is useless.
        Hide
        Robert Muir added a comment -

        Take 5 seconds of your time to see my 2 printouts above. I dont really understand why this isnt 100% clear.

        Show
        Robert Muir added a comment - Take 5 seconds of your time to see my 2 printouts above. I dont really understand why this isnt 100% clear.
        Hide
        Uwe Schindler added a comment - - edited

        Oh fuck Sorry. Not my day, more beer needed!

        Show
        Uwe Schindler added a comment - - edited Oh fuck Sorry. Not my day, more beer needed!
        Hide
        Robert Muir added a comment -

        now i see why i was so confused!

        Show
        Robert Muir added a comment - now i see why i was so confused!
        Hide
        ASF subversion and git services added a comment -

        Commit 1531376 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1531376 ]

        LUCENE-5275: Change AttributeSource.toString to display the current state of attributes

        Show
        ASF subversion and git services added a comment - Commit 1531376 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1531376 ] LUCENE-5275 : Change AttributeSource.toString to display the current state of attributes
        Hide
        ASF subversion and git services added a comment -

        Commit 1531381 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1531381 ]

        LUCENE-5275: Change AttributeSource.toString to display the current state of attributes

        Show
        ASF subversion and git services added a comment - Commit 1531381 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531381 ] LUCENE-5275 : Change AttributeSource.toString to display the current state of attributes

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development