Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      I don't think we should include optimize in the demo; many people start from the demo and may think you must optimize to do searching, and that's clearly not the case.

      I think we should also use a buffered reader in FileDocument?

      And... I'm tempted to remove IndexHTML (and the html parser) entirely. It's ancient, and we now have Tika to extract text from many doc formats.

      1. LUCENE-2923.patch
        4 kB
        Robert Muir
      2. LUCENE-2923.patch
        617 kB
        Steve Rowe
      3. LUCENE-2923.patch
        412 kB
        Michael McCandless
      4. LUCENE-2923.patch
        413 kB
        Michael McCandless
      5. LUCENE-2923.patch
        371 kB
        Michael McCandless
      6. LUCENE-2923-part3.patch
        3 kB
        Steve Rowe

        Activity

        Grant Ingersoll made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1
        Steve Rowe made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Steve Rowe made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Steve Rowe added a comment -

        crap - forgot to commit branch_3x part 3 port...

        Show
        Steve Rowe added a comment - crap - forgot to commit branch_3x part 3 port...
        Steve Rowe made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Lucene Fields [New]
        Resolution Fixed [ 1 ]
        sarowe committed 1072685 (4 files)
        Steve Rowe made changes -
        Attachment LUCENE-2923-part3.patch [ 12471526 ]
        Hide
        Steve Rowe added a comment -

        This patch:

        • fixes demo2.html to remove description of the streaming search removed from the SearchFiles demo app by Robert's patch
        • removes the obsolete "<root directory>" text from the usage message in the IndexFiles demo app, as well as the same info in demo.html
        • Stops the SearchFiles demo app in interactive mode from asking for which page to navigate to when there are zero results - instead asks for another query.

        Committing shortly.

        Show
        Steve Rowe added a comment - This patch: fixes demo2.html to remove description of the streaming search removed from the SearchFiles demo app by Robert's patch removes the obsolete "<root directory>" text from the usage message in the IndexFiles demo app, as well as the same info in demo.html Stops the SearchFiles demo app in interactive mode from asking for which page to navigate to when there are zero results - instead asks for another query. Committing shortly.
        rmuir committed 1072608 (4 files)
        rmuir committed 1072607 (1 file)
        Reviews: none

        LUCENE-2923: simplify demo by removing custom collector/paging=false option

        Hide
        Steve Rowe added a comment -

        Fix for modules/benchmark/build.xml committed to trunk rev. 1072393. (branch_3x doesn't have this problem since it has no modules and the benchmark contrib works the same as the other contribs for the generate-maven-artifacts target.)

        Show
        Steve Rowe added a comment - Fix for modules/benchmark/build.xml committed to trunk rev. 1072393. (branch_3x doesn't have this problem since it has no modules and the benchmark contrib works the same as the other contribs for the generate-maven-artifacts target.)
        sarowe committed 1072393 (1 file)
        Reviews: none

        LUCENE-2923: Restore benchmark jar production for generate-maven-artifacts target

        Hide
        Steve Rowe added a comment -

        The nightly hudson trunk maven build failed last night because the benchmark jar wasn't being produced by the dist-maven target, which is called from the generate-maven-artifacts target. From https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/31/consoleText :

        dist-maven:
        [artifact:deploy] Deploying to file:///usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/checkout/modules/dist/maven
        [artifact:deploy] An error has occurred while processing the Maven artifact tasks.
        [artifact:deploy]  Diagnosis:
        [artifact:deploy] 
        [artifact:deploy] Error deploying artifact 'org.apache.lucene:lucene-benchmark:jar': Error deploying artifact:
         File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/checkout/modules/benchmark/build/lucene-benchmark-4.0-SNAPSHOT.jar does not exist
        
        BUILD FAILED
        

        This can be fixed by putting back the dist-maven specialization that I took out of modules/benchmark/build.xml:

        <target name="dist-maven" depends="jar-core,javadocs,contrib-build.dist-maven"/>
        

        Committing shortly.

        Show
        Steve Rowe added a comment - The nightly hudson trunk maven build failed last night because the benchmark jar wasn't being produced by the dist-maven target, which is called from the generate-maven-artifacts target. From https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/31/consoleText : dist-maven: [artifact:deploy] Deploying to file:///usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/checkout/modules/dist/maven [artifact:deploy] An error has occurred while processing the Maven artifact tasks. [artifact:deploy] Diagnosis: [artifact:deploy] [artifact:deploy] Error deploying artifact 'org.apache.lucene:lucene-benchmark:jar': Error deploying artifact: File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/checkout/modules/benchmark/build/lucene-benchmark-4.0-SNAPSHOT.jar does not exist BUILD FAILED This can be fixed by putting back the dist-maven specialization that I took out of modules/benchmark/build.xml : <target name= "dist-maven" depends= "jar-core,javadocs,contrib-build.dist-maven" /> Committing shortly.
        Hide
        Steve Rowe added a comment -

        Initial patch commited: trunk rev. 1072250, branch_3x rev. 1072386.

        Leaving this issue open to deal with Robert's patch.

        Show
        Steve Rowe added a comment - Initial patch commited: trunk rev. 1072250, branch_3x rev. 1072386. Leaving this issue open to deal with Robert's patch.
        Hide
        Steve Rowe added a comment -

        Here's a patch that removes the custom collector and the "-paging false/streaming" option, simplifying the demo a lot more.

        +1. I think the site docs will also need some revision.

        Show
        Steve Rowe added a comment - Here's a patch that removes the custom collector and the "-paging false/streaming" option, simplifying the demo a lot more. +1. I think the site docs will also need some revision.
        sarowe committed 1072386 (107 files)
        Reviews: none

        LUCENE-2923: Cleanup contrib/demo

        Lucene branch_3x
        Robert Muir made changes -
        Attachment LUCENE-2923.patch [ 12471471 ]
        Hide
        Robert Muir added a comment -

        Here's a patch that removes the custom collector and the "-paging false/streaming" option, simplifying the demo a lot more.

        Show
        Robert Muir added a comment - Here's a patch that removes the custom collector and the "-paging false/streaming" option, simplifying the demo a lot more.
        Hide
        Michael McCandless added a comment -

        Thanks Steven!

        Show
        Michael McCandless added a comment - Thanks Steven!
        sarowe committed 1072250 (104 files)
        Reviews: none

        LUCENE-2923: Cleanup contrib/demo

        Lucene trunk
        Steve Rowe made changes -
        Attachment LUCENE-2923.patch [ 12471456 ]
        Hide
        Steve Rowe added a comment -

        This patch adds the following (and includes all of Mike's patch):

        • IntelliJ IDEA configuration adjustments
        • Maven POM adjustments
        • Source and regenerated site docs
        • contrib/demo/build.xml builds the Lucene core jar in the init target if it doesn't exist or if its .java source files are newer than the jar.
        • contrib/demo/build.xml default target is now jar-core
        • At Mike's suggestion on #lucene IRC, added javadoc comment to IndexFiles pointing out that 1 input doc per file is slow, and that good throughput can be gained by putting multiple docs in a single input file.
        • Other minor cleanups

        Committing shortly.

        Show
        Steve Rowe added a comment - This patch adds the following (and includes all of Mike's patch): IntelliJ IDEA configuration adjustments Maven POM adjustments Source and regenerated site docs contrib/demo/build.xml builds the Lucene core jar in the init target if it doesn't exist or if its .java source files are newer than the jar. contrib/demo/build.xml default target is now jar-core At Mike's suggestion on #lucene IRC, added javadoc comment to IndexFiles pointing out that 1 input doc per file is slow, and that good throughput can be gained by putting multiple docs in a single input file. Other minor cleanups Committing shortly.
        sarowe committed 1072010 (5 files)
        sarowe committed 1072009 (2 files)
        Reviews: none

        LUCENE-2926: reverting all changes - it slows down the build. LUCENE-2923 will include these changes for the demo contrib only

        Steve Rowe made changes -
        Assignee Michael McCandless [ mikemccand ] Steven Rowe [ steve_rowe ]
        Hide
        Michael McCandless added a comment -

        Woops! Thanks Steven.

        Show
        Michael McCandless added a comment - Woops! Thanks Steven.
        Hide
        Steve Rowe added a comment -

        Mike, a small typo:

        To apply the patch you first have to do this:

        svn mv lucene/contrib/benchmark/src/java/org/apache/lucene/demo/html modules/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/demohtml

        That should be svn mv lucene/contrib/demo/src/... modules/benchmark/src/...

        Show
        Steve Rowe added a comment - Mike, a small typo: To apply the patch you first have to do this: svn mv lucene/contrib/benchmark/src/java/org/apache/lucene/demo/html modules/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/demohtml That should be svn mv lucene/contrib/ demo /src/... modules/benchmark/src/...
        Hide
        Michael McCandless added a comment -

        Thanks Steven.

        Right, I removed both because I think we don't need maven details in the build.xml (since the Maven POMs are now separately configured).. well, and because we don't make a WAR anymore.

        Show
        Michael McCandless added a comment - Thanks Steven. Right, I removed both because I think we don't need maven details in the build.xml (since the Maven POMs are now separately configured).. well, and because we don't make a WAR anymore.
        Hide
        Steve Rowe added a comment -

        Can someone lookover my changes to the build xml files? (Especially the Maven part, where I completely guessed!).

        I skimmed the patch and can see that you removed the specializations of the dist-maven task in both modules/benchmark/ and lucene/contrib/demo/. For benchmark, the specialization was trivial and didn't change any behavior, so I assume that's why you dropped it. In the demo case, the specialization was introduced to be able to deploy the .war file, but since the .war file is no longer produced, the dist-maven specialization is no longer required. Later today I'll have time to apply the patch and do sanity checking on the maven outputs.

        One thing left to do (I can handle this separately): change the Maven POMs for the benchmark and demo modules.

        Show
        Steve Rowe added a comment - Can someone lookover my changes to the build xml files? (Especially the Maven part, where I completely guessed!). I skimmed the patch and can see that you removed the specializations of the dist-maven task in both modules/benchmark/ and lucene/contrib/demo/ . For benchmark, the specialization was trivial and didn't change any behavior, so I assume that's why you dropped it. In the demo case, the specialization was introduced to be able to deploy the .war file, but since the .war file is no longer produced, the dist-maven specialization is no longer required. Later today I'll have time to apply the patch and do sanity checking on the maven outputs. One thing left to do (I can handle this separately): change the Maven POMs for the benchmark and demo modules.
        Michael McCandless made changes -
        Attachment LUCENE-2923.patch [ 12471280 ]
        Hide
        Michael McCandless added a comment -

        New patch.

        I inverted IndexFiles -create to IndexFiles -update (ie default is now create, like before).

        Added reference to web-site docs in the jdocs, and added comment on the optional "increase RAM buffer size" that you should also -Xmx your JVM.

        I think it's ready to commit, but separately we have to fix the web site.

        Show
        Michael McCandless added a comment - New patch. I inverted IndexFiles -create to IndexFiles -update (ie default is now create, like before). Added reference to web-site docs in the jdocs, and added comment on the optional "increase RAM buffer size" that you should also -Xmx your JVM. I think it's ready to commit, but separately we have to fix the web site.
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563425 ] jira [ 12584592 ]
        Michael McCandless made changes -
        Attachment LUCENE-2923.patch [ 12471267 ]
        Hide
        Michael McCandless added a comment -

        OK new patch, fixing a number of things:

        • I close the Reader (thanks Mark).
        • I cutover to NumericField (and stopped using DateTools) for the
          "modified" field.
        • I added a -create option to IndexFiles, so you can see how to
          CREATE vs CREATE_OR_APPEND
        • I left commented-out optional things – calling optimize,
          increasing IW's RAM buffer.
        • Don't use Version.LUCENE_CURRENT.
        • I sucked in test files from Lucene in Action 2E's tests (open
          source licenses).
        • I use addDocument or updateDocument depending on -create.
        • I made the "demo html parser" private to modules/benchmark, which
          had a dependency on it. Can someone lookover my changes to the
          build xml files? (Especially the Maven part, where I completely
          guessed!).
        • IndexHTML is gone, and the webapp (src/jsp/*) is gone too.

        To apply the patch you first have to do this:

        svn mv lucene/contrib/benchmark/src/java/org/apache/lucene/demo/html modules/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/demohtml
        svn mv lucene/contrib/demo/src/test/org/apache/lucene/demo/html modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/demohtml
        
        Show
        Michael McCandless added a comment - OK new patch, fixing a number of things: I close the Reader (thanks Mark). I cutover to NumericField (and stopped using DateTools) for the "modified" field. I added a -create option to IndexFiles, so you can see how to CREATE vs CREATE_OR_APPEND I left commented-out optional things – calling optimize, increasing IW's RAM buffer. Don't use Version.LUCENE_CURRENT. I sucked in test files from Lucene in Action 2E's tests (open source licenses). I use addDocument or updateDocument depending on -create. I made the "demo html parser" private to modules/benchmark, which had a dependency on it. Can someone lookover my changes to the build xml files? (Especially the Maven part, where I completely guessed!). IndexHTML is gone, and the webapp (src/jsp/*) is gone too. To apply the patch you first have to do this: svn mv lucene/contrib/benchmark/src/java/org/apache/lucene/demo/html modules/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/demohtml svn mv lucene/contrib/demo/src/test/org/apache/lucene/demo/html modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/demohtml
        Mark Thomas made changes -
        Workflow jira [ 12558323 ] Default workflow, editable Closed status [ 12563425 ]
        Hide
        Mark Miller added a comment -

        I don't think we should include optimize in the demo;

        I wonder if it wouldn't be better to leave it, but commented out - with a short explanation.

        Optimizing is not necessary, but it clearly has benefits to query perf! If you are not updating often, I think it can make perfect sense.

        So I'm fine with just dropping, but not sure if commenting it out and putting something like:
        // for an index that is not updated often, we might optimize now

        or variation...

        Show
        Mark Miller added a comment - I don't think we should include optimize in the demo; I wonder if it wouldn't be better to leave it, but commented out - with a short explanation. Optimizing is not necessary, but it clearly has benefits to query perf! If you are not updating often, I think it can make perfect sense. So I'm fine with just dropping, but not sure if commenting it out and putting something like: // for an index that is not updated often, we might optimize now or variation...
        Hide
        Mark Miller added a comment -

        I think we should also use a buffered reader in FileDocument?

        And close the reader...

        Show
        Mark Miller added a comment - I think we should also use a buffered reader in FileDocument? And close the reader...
        Michael McCandless made changes -
        Attachment LUCENE-2923.patch [ 12471223 ]
        Hide
        Michael McCandless added a comment -

        Patch.

        Show
        Michael McCandless added a comment - Patch.
        Hide
        Uwe Schindler added a comment -

        Yeah, we remove the optimize. Too many people tell me exactly that they should optimize because they see it in almost every demo code. Optimizing is with recent Lucene versions not needed anymore. It's hard to explain to people, so example code and books should never tell to optimize. In books about lucene there should also be an explanation when optimizing is needed or usefully, put prevent people from always doing this.

        Show
        Uwe Schindler added a comment - Yeah, we remove the optimize. Too many people tell me exactly that they should optimize because they see it in almost every demo code. Optimizing is with recent Lucene versions not needed anymore. It's hard to explain to people, so example code and books should never tell to optimize. In books about lucene there should also be an explanation when optimizing is needed or usefully, put prevent people from always doing this.
        Michael McCandless made changes -
        Field Original Value New Value
        Summary remove writer.optimize() from contrib/demo cleanup contrib/demo
        Description I don't think we should include optimize in the demo; many people start from the demo and may think you must optimize to do searching, and that's clearly not the case. I don't think we should include optimize in the demo; many people start from the demo and may think you must optimize to do searching, and that's clearly not the case.

        I think we should also use a buffered reader in FileDocument?

        And... I'm tempted to remove IndexHTML (and the html parser) entirely. It's ancient, and we now have Tika to extract text from many doc formats.
        Michael McCandless created issue -

          People

          • Assignee:
            Steve Rowe
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development