Solr
  1. Solr
  2. SOLR-2379

Improve documentation of Analyzers and Tokenizers

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: documentation
    • Labels:

      Activity

      Jan Høydahl created issue -
      Hide
      Jan Høydahl added a comment -

      We have two choices, suggested by Yonik:

      • update the wiki with the missing analysis components or
      • think about switching strategies to pointing at generated javadoc
      Show
      Jan Høydahl added a comment - We have two choices, suggested by Yonik: update the wiki with the missing analysis components or think about switching strategies to pointing at generated javadoc
      Hide
      Otis Gospodnetic added a comment -

      Whatever we choose, let's stick to DRY. I think that may imply the javadoc approach (maybe with just an easily findable pointer from the Wiki?)

      Show
      Otis Gospodnetic added a comment - Whatever we choose, let's stick to DRY. I think that may imply the javadoc approach (maybe with just an easily findable pointer from the Wiki?)
      Hide
      Jan Høydahl added a comment -

      Agree. That has the benefit of improving JavaDoc quality as well for a lot of classes.
      An example of excellent JavaDoc is the Similarity class: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Similarity.html
      Browsing through the analyzers, they are very sparse on javadoc

      ASIDE: And what about finally making the move to the Confluence Wiki as well (https://cwiki.apache.org/SOLRxSITE/). Then we could simply include Javadoc inline in pages through the javadoc plugin https://plugins.atlassian.com/plugin/details/11120, and also get auto linking to Jira issues.

      Show
      Jan Høydahl added a comment - Agree. That has the benefit of improving JavaDoc quality as well for a lot of classes. An example of excellent JavaDoc is the Similarity class: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Similarity.html Browsing through the analyzers, they are very sparse on javadoc ASIDE: And what about finally making the move to the Confluence Wiki as well ( https://cwiki.apache.org/SOLRxSITE/ ). Then we could simply include Javadoc inline in pages through the javadoc plugin https://plugins.atlassian.com/plugin/details/11120 , and also get auto linking to Jira issues.
      Hide
      Robert Muir added a comment -

      Browsing through the analyzers, they are very sparse on javadoc

      Which analyzers are you referring to?

      Show
      Robert Muir added a comment - Browsing through the analyzers, they are very sparse on javadoc Which analyzers are you referring to?
      Hide
      Jan Høydahl added a comment -

      Few of the TokenFilterFactories are documented at all. Some of them have a simple XML config snippet example. Take the StandardTokenFilterFactory. It had no class JavaDoc until two days ago, when Koji and yourself added an xml snippet.

      But should the documentation be on the Factory or on the Filter? The WordDelimiterFilterFactory is not documented but the Filter itself is (although it is not correctly HTML formatted so it looks broken in the browser).

      I think a reasonable goal, at least for these plugin type of classes, is to use the JavaDoc as the official main doc and point from Wiki to there. But then the Class-level JavaDoc must give a short introduction to what the filter does, when it is typically used along with a list of all valid parameters and their values.

      Show
      Jan Høydahl added a comment - Few of the TokenFilterFactories are documented at all. Some of them have a simple XML config snippet example. Take the StandardTokenFilterFactory. It had no class JavaDoc until two days ago, when Koji and yourself added an xml snippet. But should the documentation be on the Factory or on the Filter? The WordDelimiterFilterFactory is not documented but the Filter itself is (although it is not correctly HTML formatted so it looks broken in the browser). I think a reasonable goal, at least for these plugin type of classes, is to use the JavaDoc as the official main doc and point from Wiki to there. But then the Class-level JavaDoc must give a short introduction to what the filter does, when it is typically used along with a list of all valid parameters and their values.
      Hide
      Robert Muir added a comment -

      Few of the TokenFilterFactories are documented at all. Some of them have a simple XML config snippet example. Take the StandardTokenFilterFactory. It had no class JavaDoc until two days ago, when Koji and yourself added an xml snippet.

      Thats not really correct: typically they have a link to the tokenfilter, for example here is ThaiWordFilterFactory.
      (Factory for

      {@link ThaiWordFilter}

      ). if they take arguments, then typically they describe what those arguments do.

      This is enough, because someone can click the ThaiWordFilter and get all the details there.

      The javadocs for the factory need only document the factory.

      I think a reasonable goal, at least for these plugin type of classes, is to use the JavaDoc as the official main doc and point from Wiki to there. But then the Class-level JavaDoc must give a short introduction to what the filter does, when it is typically used along with a list of all valid parameters and their values.

      I really don't think we should duplicate documentation from any Tokenizers/Filters into the factories. The factory should just have a javadoc ref to what it produces, and explain its various parameters. In other words, it need only document itself.

      Any other documentation is actually redundant and problematic, as long as this javadoc exists it increases the maintenance load around here with no benefits to the user at all.

      Show
      Robert Muir added a comment - Few of the TokenFilterFactories are documented at all. Some of them have a simple XML config snippet example. Take the StandardTokenFilterFactory. It had no class JavaDoc until two days ago, when Koji and yourself added an xml snippet. Thats not really correct: typically they have a link to the tokenfilter, for example here is ThaiWordFilterFactory. (Factory for {@link ThaiWordFilter} ). if they take arguments, then typically they describe what those arguments do. This is enough, because someone can click the ThaiWordFilter and get all the details there. The javadocs for the factory need only document the factory. I think a reasonable goal, at least for these plugin type of classes, is to use the JavaDoc as the official main doc and point from Wiki to there. But then the Class-level JavaDoc must give a short introduction to what the filter does, when it is typically used along with a list of all valid parameters and their values. I really don't think we should duplicate documentation from any Tokenizers/Filters into the factories. The factory should just have a javadoc ref to what it produces, and explain its various parameters. In other words, it need only document itself. Any other documentation is actually redundant and problematic, as long as this javadoc exists it increases the maintenance load around here with no benefits to the user at all.
      Hide
      Jan Høydahl added a comment -

      About where to document, that was a question. Linking from Factory to Filter is a good practice.

      Looks like there is a lot of JavaDoc improvements with 3.1, so once that's out the door, it should be possible to rework and slim down the analysis wiki page quite much.

      Show
      Jan Høydahl added a comment - About where to document, that was a question. Linking from Factory to Filter is a good practice. Looks like there is a lot of JavaDoc improvements with 3.1, so once that's out the door, it should be possible to rework and slim down the analysis wiki page quite much.
      Hide
      Robert Muir added a comment -

      I agree, if you find factories in trunk/branch_3x that do not @link the filter/tokenizer they create, and don't describe how to use the factory (e.g. set parameters), I think we should fix those.

      I did a quick check for the former, and I think all factories link to the filter/tokenizer they create.

      In general I think its best if any parameters/options are described in the filters themselves too, so that lucene users see this documentation, and so we can very verbosely describe what these parameters do all in one place (to reduce confusion).

      Then any factories can simply link to the original documentation for the parameter values, too. Here's an example: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/spelling/DirectSolrSpellChecker.java

      Show
      Robert Muir added a comment - I agree, if you find factories in trunk/branch_3x that do not @link the filter/tokenizer they create, and don't describe how to use the factory (e.g. set parameters), I think we should fix those. I did a quick check for the former, and I think all factories link to the filter/tokenizer they create. In general I think its best if any parameters/options are described in the filters themselves too, so that lucene users see this documentation, and so we can very verbosely describe what these parameters do all in one place (to reduce confusion). Then any factories can simply link to the original documentation for the parameter values, too. Here's an example: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/spelling/DirectSolrSpellChecker.java
      Hide
      Dawid Weiss added a comment -

      I realize such things are a revolutionary refactoring, but in the light of this perhaps it would be beneficial to switch from JavaDoc to Google's Doclava. Among other things it includes embedding real-code snippets, so you get a consistent javadoc.

      Note: I haven't used it yet, I just know it supports such things, see @sample tag here:
      http://code.google.com/p/doclava/wiki/JavadocTags

      Show
      Dawid Weiss added a comment - I realize such things are a revolutionary refactoring, but in the light of this perhaps it would be beneficial to switch from JavaDoc to Google's Doclava. Among other things it includes embedding real-code snippets, so you get a consistent javadoc. Note: I haven't used it yet, I just know it supports such things, see @sample tag here: http://code.google.com/p/doclava/wiki/JavadocTags
      Hide
      Yonik Seeley added a comment -

      ASIDE: And what about finally making the move to the Confluence Wiki as well

      Unfortunately, it looks like confluence is going away at the ASF:
      http://www.apache.org/dev/cms.html

      score 1 for procrastination

      Show
      Yonik Seeley added a comment - ASIDE: And what about finally making the move to the Confluence Wiki as well Unfortunately, it looks like confluence is going away at the ASF: http://www.apache.org/dev/cms.html score 1 for procrastination
      Hide
      Mark Miller added a comment -

      That almost looks like it could be going away for project main website stuff ... but that is separate from the wiki is it not?

      It seems like we would stick to real wiki software for the wiki portion of the site at Apache?

      In which case, I think Confluence is nice improvement over MoinMoin.

      Our website and wiki look ancient as another aside.

      Show
      Mark Miller added a comment - That almost looks like it could be going away for project main website stuff ... but that is separate from the wiki is it not? It seems like we would stick to real wiki software for the wiki portion of the site at Apache? In which case, I think Confluence is nice improvement over MoinMoin. Our website and wiki look ancient as another aside.
      Hide
      Yonik Seeley added a comment -

      Re-reading the link, I think you're right - it's only support for confluence backed sites that is being phased out.
      I've long been in favor of a move to confluence, but no real time to do it myself.

      Show
      Yonik Seeley added a comment - Re-reading the link, I think you're right - it's only support for confluence backed sites that is being phased out. I've long been in favor of a move to confluence, but no real time to do it myself.
      Hide
      Hoss Man added a comment -

      this renewed interest in adding more focus and attention to the javadocs as user visible documentation has be thinking that maybe the time has come to revive SOLR-555.

      I lost steam on it back in the day because I was having trouble drumming up interest from other people to help get the javadocs of all the various plugin instances to the state where the output would be useful for non java users (most people seemed content to just use the wiki) and it seemed better to ship no docs then ship bad docs.

      Show
      Hoss Man added a comment - this renewed interest in adding more focus and attention to the javadocs as user visible documentation has be thinking that maybe the time has come to revive SOLR-555 . I lost steam on it back in the day because I was having trouble drumming up interest from other people to help get the javadocs of all the various plugin instances to the state where the output would be useful for non java users (most people seemed content to just use the wiki) and it seemed better to ship no docs then ship bad docs.
      Hide
      Jan Høydahl added a comment -

      Closing old issue, fixed well enough, focusing on improving JavaDocs rather than having Wiki being complete.

      Show
      Jan Høydahl added a comment - Closing old issue, fixed well enough, focusing on improving JavaDocs rather than having Wiki being complete.
      Jan Høydahl made changes -
      Field Original Value New Value
      Status Open [ 1 ] Closed [ 6 ]
      Resolution Won't Fix [ 2 ]

        People

        • Assignee:
          Unassigned
          Reporter:
          Jan Høydahl
        • Votes:
          0 Vote for this issue
          Watchers:
          0 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development