Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/other
    • Labels:
    • Lucene Fields:
      New

      Description

      Trunk has gone under lots of API changes. Some of which are not trivial, and the migration path from 3.x to 4.0 seems hard. I'd like to propose some way to tackle this, by means of live example code.

      The facet module implements this approach. There is live Java code under src/examples that demonstrate some well documented scenarios. The code itself is documented, in addition to javadoc. Also, the code itself is being unit tested regularly.

      We found it very difficult to keep documentation up-to-date – javadocs always lag behind, Wiki pages get old etc. However, when you have live Java code, you're forced to keep it up-to-date. It doesn't compile if you break the API, it fails to run if you change internal impl behavior. If you keep it simple enough, its documentation stays simple to.

      And if we are successful at maintaining it (which we must be, otherwise the build should fail), then people should have an easy experience migrating between releases. So say you take the simple scenario "I'd like to index documents which have the fields ID, date and body". Then you create an example class/method that accomplishes that. And between releases, this code gets updated, and people can follow the changes required to implement that scenario.

      I'm not saying the examples code should always stay optimized. We can aim at that, but I don't try to fool myself thinking that we'll succeed. But at least we can get it compiled and regularly unit tested.

      I think that it would be good if we introduce the concept of examples such that if a module (core, contrib, modules) have an src/examples, we package it in a .jar and include it with the binary distribution. That's for a first step. We can also have meta examples, under their own module/contrib, that show how to combine several modules together (this might even uncover API problems), but that's definitely a second phase.

      At first, let's do the "unit examples" (ala unit tests) and better start with core. Whatever we succeed at writing for 4.0 will only help users. So let's use this issue to:

      1. List example scenarios that we want to demonstrate for core
      2. Building the infrastructure in our build system to package and distribute a module's examples.

      Please feel free to list here example scenarios that come to mind. We can then track what's been done and what's not. The more we do the better.

      1. LUCENE-3550-sort.patch
        13 kB
        Aleksandra Wozniak
      2. LUCENE-3550.patch
        6 kB
        Manpreet

        Activity

        Hide
        Manpreet added a comment -

        Hi Aleksandra -

        I have been away from it for a while.

        Resuming my work from this week. Sure I will do that.

        Thanks
        -Manpreet

        Show
        Manpreet added a comment - Hi Aleksandra - I have been away from it for a while. Resuming my work from this week. Sure I will do that. Thanks -Manpreet
        Hide
        Aleksandra Wozniak added a comment -

        Hi,

        recently I started learning Lucene API and I along the way created a few snippets showing different Lucene features. I found this issue by coincidence and I decided to rework one of them to fit into the examples implementation – I'm sending a patch with my sort example + a corresponding unit test.

        Manpreet, I see that you started working on this issue a while ago – I don't want to interfere with your work. You can incorporate my example in your code or use it in any other way, if you find it useful.

        Cheers,
        Aleksandra

        Show
        Aleksandra Wozniak added a comment - Hi, recently I started learning Lucene API and I along the way created a few snippets showing different Lucene features. I found this issue by coincidence and I decided to rework one of them to fit into the examples implementation – I'm sending a patch with my sort example + a corresponding unit test. Manpreet, I see that you started working on this issue a while ago – I don't want to interfere with your work. You can incorporate my example in your code or use it in any other way, if you find it useful. Cheers, Aleksandra
        Hide
        Manpreet added a comment -

        Perfect & Noted.

        I shall follow the review comments & make the changes accordingly. Thanks again for your help & review.

        regards
        -ms

        Show
        Manpreet added a comment - Perfect & Noted. I shall follow the review comments & make the changes accordingly. Thanks again for your help & review. regards -ms
        Hide
        Shai Erera added a comment -

        Few comments:

        • Please remove @author tags. We don't use them as well as the build fails if it finds any.
        • In general, I think that the code needs to be more documented, since this is an example code. So for instance I would add:
          • to index() a comment saying "IndexWriterConfig lets you configure how IndexWriter works as well as how documents are indexed".
          • to search() a comment saying "QueryParser is able to parse a query string into a meaningful Query object which is used to match and score documents".
          • etc...
        • If there's nothing special to say about an exception that is thrown, can you please remove @throws from javadocs?
        • addDocs:
          • I would rename to addDoc
          • Modify the comment "create index" to "add document to the index"
        • Currently the code prints messages, which we try to avoid (e.g. during tests). So either we add to DemoConstants a VERBOSE property that is initialized to System.getProperty("tests.verbose"), or you just move all the prints to main()?
          • In that regard, search() can return a ScoreDoc[] which main() can use to print results as well as tests could use to assert on.
          • I.e. rather than asserting that search() returned 1 or 2 hits, we can assert their order etc. (not saying we have to for this example).
        • In order to better test the example, I would make it take a Directory (e.g. index(Directory), search(Directory) or SimpleCoreExample(Directory)) and pass from tests newDirectory() (note: there's no space intentionally).
          • This will detect incomplete code, e.g. you don't close the reader in search().
        • Also, I think that the example should better clarify that we don't e.g. care about casing, so for instance if you index "Apache" search for "apache".
          • main() could also run two searches, to print diverse results
          • and tests (and main()) should test multi-word queries too

        As a start, it looks great. I think though that it would be better if our simple example contained:

          • Documents with more than one field, to show different Field types (TextField, StringField, DocValuesField)
          • Instead of a single search(), have different searchXYZ methods, e.g.
            • searchKeyword (using default field), searchFields (execute fielded search)
            • searchBooleanQuery, searchRangeQuery to show QueryParser's syntax
            • searchSort to sort results

        I consider these simple/basic examples, since that's really the essence of Lucene – index documents with few fields and querying for them in different ways.

        Show
        Shai Erera added a comment - Few comments: Please remove @author tags. We don't use them as well as the build fails if it finds any. In general, I think that the code needs to be more documented, since this is an example code. So for instance I would add: to index() a comment saying "IndexWriterConfig lets you configure how IndexWriter works as well as how documents are indexed". to search() a comment saying "QueryParser is able to parse a query string into a meaningful Query object which is used to match and score documents". etc... If there's nothing special to say about an exception that is thrown, can you please remove @throws from javadocs? addDocs: I would rename to addDoc Modify the comment "create index" to "add document to the index" Currently the code prints messages, which we try to avoid (e.g. during tests). So either we add to DemoConstants a VERBOSE property that is initialized to System.getProperty("tests.verbose"), or you just move all the prints to main()? In that regard, search() can return a ScoreDoc[] which main() can use to print results as well as tests could use to assert on. I.e. rather than asserting that search() returned 1 or 2 hits, we can assert their order etc. (not saying we have to for this example). In order to better test the example, I would make it take a Directory (e.g. index(Directory), search(Directory) or SimpleCoreExample(Directory)) and pass from tests newDirectory() (note: there's no space intentionally). This will detect incomplete code, e.g. you don't close the reader in search(). Also, I think that the example should better clarify that we don't e.g. care about casing, so for instance if you index "Apache" search for "apache". main() could also run two searches, to print diverse results and tests (and main()) should test multi-word queries too As a start, it looks great. I think though that it would be better if our simple example contained: Documents with more than one field, to show different Field types (TextField, StringField, DocValuesField) Instead of a single search(), have different searchXYZ methods, e.g. searchKeyword (using default field), searchFields (execute fielded search) searchBooleanQuery, searchRangeQuery to show QueryParser's syntax searchSort to sort results I consider these simple/basic examples, since that's really the essence of Lucene – index documents with few fields and querying for them in different ways.
        Hide
        Manpreet added a comment -

        Patch for Lucene-3550

        Show
        Manpreet added a comment - Patch for Lucene-3550
        Hide
        Manpreet added a comment -

        Hi Shai - created the patch for 3550. Kindly review.

        Thanks
        -MS

        Show
        Manpreet added a comment - Hi Shai - created the patch for 3550. Kindly review. Thanks -MS
        Hide
        Manpreet added a comment -

        Patch for Example Code

        Show
        Manpreet added a comment - Patch for Example Code
        Hide
        Manpreet added a comment -

        Surely I will do that. Thanks.

        Show
        Manpreet added a comment - Surely I will do that. Thanks.
        Hide
        Shai Erera added a comment -

        Ok great. Also, if you can, please create the patch on 'trunk' and not 4x.

        Show
        Shai Erera added a comment - Ok great. Also, if you can, please create the patch on 'trunk' and not 4x.
        Hide
        Manpreet added a comment -

        Hi Shai - Thanks. Thats true

        • I will change it to more simpler approach as you said.
        • Will create DemoConstant, that's much better than each module having its own.
        • I did add 'TestSimpleExample' a Lucene Test Case. Will verify & change accordingly.

        Thanks
        -MS

        Show
        Manpreet added a comment - Hi Shai - Thanks. Thats true I will change it to more simpler approach as you said. Will create DemoConstant, that's much better than each module having its own. I did add 'TestSimpleExample' a Lucene Test Case. Will verify & change accordingly. Thanks -MS
        Hide
        Shai Erera added a comment -

        Hi Mandy. I realize you followed the facets example "exactly" . I recently simplified them a lot, and that's what I think you should do with the simple example.

        • Rather than SimpleIndexer/Searcher/Main, just have a simple class, which runs scenarios in separate methods, even if it reindexes content etc.
        • Maybe it would be good if instead of ExampleUtils (and now FacetExamples), we should have under demo/ a DemoConstants and declare DEMO_VERSION there?
        • Would you please add a unit test that corresponds to the simple example? See how e.g. SimpleFacetExample and TestSimpleFacetExample work.
        • I would nuke ExampleResult.
        Show
        Shai Erera added a comment - Hi Mandy. I realize you followed the facets example "exactly" . I recently simplified them a lot, and that's what I think you should do with the simple example. Rather than SimpleIndexer/Searcher/Main, just have a simple class, which runs scenarios in separate methods, even if it reindexes content etc. Maybe it would be good if instead of ExampleUtils (and now FacetExamples), we should have under demo/ a DemoConstants and declare DEMO_VERSION there? Would you please add a unit test that corresponds to the simple example? See how e.g. SimpleFacetExample and TestSimpleFacetExample work. I would nuke ExampleResult.
        Hide
        Manpreet added a comment -

        Hi Shai - Did you get chance to review.

        Show
        Manpreet added a comment - Hi Shai - Did you get chance to review.
        Hide
        Manpreet added a comment -

        Renamed to 3550.

        Show
        Manpreet added a comment - Renamed to 3550.
        Hide
        Shai Erera added a comment -

        Ok I will review. But can you please rename the patch to LUCENE-3550 (and not 8550)?

        Show
        Shai Erera added a comment - Ok I will review. But can you please rename the patch to LUCENE-3550 (and not 8550)?
        Hide
        Manpreet added a comment -

        Hi Shai - I have created the first patch which includes SimpleExample testcase. Request your review.

        Thanks
        -MS

        Show
        Manpreet added a comment - Hi Shai - I have created the first patch which includes SimpleExample testcase. Request your review. Thanks -MS
        Hide
        Manpreet added a comment -

        patch for 8550 [includes only SimpleExample testcase]

        Show
        Manpreet added a comment - patch for 8550 [includes only SimpleExample testcase]
        Hide
        Manpreet added a comment -

        Thanks Shai. I have started work on the above examples.

        I could see with latest changes even facets examples are moved under 'demo' module.

        Cheers
        -Mandy

        Show
        Manpreet added a comment - Thanks Shai. I have started work on the above examples. I could see with latest changes even facets examples are moved under 'demo' module. Cheers -Mandy
        Hide
        Shai Erera added a comment -

        Hi Mandy. The basic idea behind this issue was to create some example code which demonstrates different scenarios of indexing with Lucene. With Lucene 4.0 came many changes to the API and such example code was badly missing (luckily, there was good migration document).

        The facets module has such example code which:

        • Ensures that when the API changes, the code is updated – it's like live documentation of the code, which we must update in order for it to compile
        • Tested regularly, so that it not only compiles, but also works .

        At the time I thought that it would be good to follow that practice for Lucene core, ensuring that when APIs change / features removed, we update the corresponding example code on one hand, but also have the chance to evaluate the change, against real code.

        Lucene has a 'demo' module, so we should put the examples code under it. Let's start by defining some use cases that we'd like to demo, e.g.:

        • SimpleExample: index few fields into few documents and offer index() and search() methods, that index the content, as well as search some stuff.
          • A corresponding test should e.g. run some queries against the result index and validate that things were indexed properly.
        • SortExample: same as above, but indexes some fields for sorting purposes, using e.g. DocValues and whatever else we can sort on.
          • And again, a corresponding test.
        • NumericExample: index few numeric fields and demo range queries etc.
          • Plus a corresponding test.

        Let's start with these, and then we can build more.

        Show
        Shai Erera added a comment - Hi Mandy. The basic idea behind this issue was to create some example code which demonstrates different scenarios of indexing with Lucene. With Lucene 4.0 came many changes to the API and such example code was badly missing (luckily, there was good migration document). The facets module has such example code which: Ensures that when the API changes, the code is updated – it's like live documentation of the code, which we must update in order for it to compile Tested regularly, so that it not only compiles, but also works . At the time I thought that it would be good to follow that practice for Lucene core, ensuring that when APIs change / features removed, we update the corresponding example code on one hand, but also have the chance to evaluate the change, against real code. Lucene has a 'demo' module, so we should put the examples code under it. Let's start by defining some use cases that we'd like to demo, e.g.: SimpleExample: index few fields into few documents and offer index() and search() methods, that index the content, as well as search some stuff. A corresponding test should e.g. run some queries against the result index and validate that things were indexed properly. SortExample: same as above, but indexes some fields for sorting purposes, using e.g. DocValues and whatever else we can sort on. And again, a corresponding test. NumericExample: index few numeric fields and demo range queries etc. Plus a corresponding test. Let's start with these, and then we can build more.
        Hide
        Manpreet added a comment -

        Hi -

        I would like to start my work on this issue. Request for your guidance.

        Cheers
        -Mandy
        (Linked in - http://www.linkedin.com/pub/manpreet-singh/16/67a/165)

        Show
        Manpreet added a comment - Hi - I would like to start my work on this issue. Request for your guidance. Cheers -Mandy (Linked in - http://www.linkedin.com/pub/manpreet-singh/16/67a/165 )
        Hide
        Steve Rowe added a comment -

        +1

        lucene/contrib/demo/ is an existing lucene-core example, and should be folded into this effort.

        About release jar naming: we could call them lucene-<module>-example, e.g. lucene-core-example-X.Y.jar, lucene-facet-example-X.Y.jar, etc.

        Show
        Steve Rowe added a comment - +1 lucene/contrib/demo/ is an existing lucene-core example, and should be folded into this effort. About release jar naming: we could call them lucene-<module>-example, e.g. lucene-core-example-X.Y.jar, lucene-facet-example-X.Y.jar, etc.

          People

          • Assignee:
            Unassigned
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development