Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      logging version of analysis.jsp

      1. SOLR-415.patch
        4 kB
        Koji Sekiguchi
      2. SOLR-415.patch
        4 kB
        Koji Sekiguchi
      3. SOLR-415.patch
        4 kB
        Koji Sekiguchi
      4. SOLR-415.patch
        6 kB
        Koji Sekiguchi

        Activity

        Hide
        Koji Sekiguchi added a comment -

        updated for current trunk (implements ResourceLoaderAware)

        Show
        Koji Sekiguchi added a comment - updated for current trunk (implements ResourceLoaderAware)
        Hide
        Koji Sekiguchi added a comment -

        Now the factory uses init(Map<String,String> args) instead of inform( ResourceLoader ) for its initialization (I was bit confused). Sorry for the noise.

        Show
        Koji Sekiguchi added a comment - Now the factory uses init(Map<String,String> args) instead of inform( ResourceLoader ) for its initialization (I was bit confused). Sorry for the noise.
        Hide
        Hoss Man added a comment -

        Koji: this is an interesting idea ... i'm really curious what your use case for this is?

        a few misc comments...

        1) it seems like it would be handy if the logging level could be configured via the factory as well.
        2) you might want to use token.toString() instead of building your own message ... that way it can express everything about the token (i notice you don't have the positionIncrement) and be future proofed against additional things being added later.

        Show
        Hoss Man added a comment - Koji: this is an interesting idea ... i'm really curious what your use case for this is? a few misc comments... 1) it seems like it would be handy if the logging level could be configured via the factory as well. 2) you might want to use token.toString() instead of building your own message ... that way it can express everything about the token (i notice you don't have the positionIncrement) and be future proofed against additional things being added later.
        Hide
        Koji Sekiguchi added a comment -

        This is for debug. One of use cases in my case for example...

        We use morphological tokenizer to tokenize Japanese text. To let the tokenizer analyze text, we have to have "character level normalization" prior to tokenization.

        I'll try to explain it by using English words...

        If you have a text to be analyzed that includes "colour". And your morphological tokenizer uses American dictionary to tokenize the text, you have to normalize "colour" to "color" so that the tokenizer can look up it in the dictionary.

        To implement this, I've developed MappingReader that reads mapping.txt and normalize (Japanese) characters prior to tokenizer:

        MappingReader -> Japanese Tokenizer -> Filters...

        In this case, if MappingReader normalizes "ou" to "o", this makes a trouble in highlighter. (I used LoggingFilter to find this problem.)

        To solve this problem, MappingReader has correctPosition(int pos) method to tell original position to tokenizer.
        (If this is useful for European languages (for umlaut or something...) I'm glad to open another JIRA issue.)

        Also in SOLR-319, I used LoggingFilter to see SynonymFilter outputs.

        I'll try to include your suggestion into my patch soon.

        Thank you.

        Show
        Koji Sekiguchi added a comment - This is for debug. One of use cases in my case for example... We use morphological tokenizer to tokenize Japanese text. To let the tokenizer analyze text, we have to have "character level normalization" prior to tokenization. I'll try to explain it by using English words... If you have a text to be analyzed that includes "colour". And your morphological tokenizer uses American dictionary to tokenize the text, you have to normalize "colour" to "color" so that the tokenizer can look up it in the dictionary. To implement this, I've developed MappingReader that reads mapping.txt and normalize (Japanese) characters prior to tokenizer: MappingReader -> Japanese Tokenizer -> Filters... In this case, if MappingReader normalizes "ou" to "o", this makes a trouble in highlighter. (I used LoggingFilter to find this problem.) To solve this problem, MappingReader has correctPosition(int pos) method to tell original position to tokenizer. (If this is useful for European languages (for umlaut or something...) I'm glad to open another JIRA issue.) Also in SOLR-319 , I used LoggingFilter to see SynonymFilter outputs. I'll try to include your suggestion into my patch soon. Thank you.
        Hide
        Koji Sekiguchi added a comment -

        attached a revised patch as Hoss kindly suggested.

        Show
        Koji Sekiguchi added a comment - attached a revised patch as Hoss kindly suggested.
        Hide
        Erick Erickson added a comment -

        Cleaning up old JIRAs, re-open if necessary.

        Show
        Erick Erickson added a comment - Cleaning up old JIRAs, re-open if necessary.

          People

          • Assignee:
            Unassigned
            Reporter:
            Koji Sekiguchi
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development