Nutch
  1. Nutch
  2. NUTCH-1306

Add option to not commit and clarify existing solr.commit.size

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: nutchgora
    • Fix Version/s: 2.1
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

      1. NUTCH-1306-v2.patch
        2 kB
        Ferdy Galema
      2. NUTCH-1306-trunk-v3.patch
        2 kB
        Ferdy Galema
      3. NUTCH-1306-trunk-v2.patch
        2 kB
        Ferdy Galema
      4. NUTCH-1306-trunk.patch
        2 kB
        Ferdy Galema
      5. NUTCH-1306.patch
        0.5 kB
        Dan Rosher

        Issue Links

          Activity

          Hide
          Lewis John McGibbney added a comment -

          Hi Dan. In trunk, we have a number of nice features which I would like to bring to your attention. Maybe you can comment on whether you would like to see some of them go into Nutchgora?

          Namely, NUTCH-1185, NUTCH-1000, NUTCH-996, NUTCH-991 and NUTCH-799

          wdyt?

          Show
          Lewis John McGibbney added a comment - Hi Dan. In trunk, we have a number of nice features which I would like to bring to your attention. Maybe you can comment on whether you would like to see some of them go into Nutchgora? Namely, NUTCH-1185 , NUTCH-1000 , NUTCH-996 , NUTCH-991 and NUTCH-799 wdyt?
          Hide
          Lewis John McGibbney added a comment -

          Having reviewed similar work that has been integrated in 1.X trunk (namely the issues I highlight above) we should remain consistent with principal that we should either commit always and have an option not to commit, or the other way around. Since NutchGora doesn't commit at all, we favour the option to commit instead a noCommit, then with the option to do a noCommit by configuration.

          It's at least clear it should be configurable one way or the other. Never committing or always committing is bad.

          (Thanks Markus for the input)

          Show
          Lewis John McGibbney added a comment - Having reviewed similar work that has been integrated in 1.X trunk (namely the issues I highlight above) we should remain consistent with principal that we should either commit always and have an option not to commit, or the other way around. Since NutchGora doesn't commit at all, we favour the option to commit instead a noCommit, then with the option to do a noCommit by configuration. It's at least clear it should be configurable one way or the other. Never committing or always committing is bad. (Thanks Markus for the input)
          Hide
          Lewis John McGibbney added a comment -

          Set and Classify

          Show
          Lewis John McGibbney added a comment - Set and Classify
          Hide
          Ferdy Galema added a comment -

          Lewis,

          Do you suggest to add the commit as implemented by the fix but make it conditional? Something like this:

          if (getConf().getBoolean("solr.commit", true))

          { solr.commit() }

          This makes it enabled by default. I think it is a good idea.

          Secondly, you say that Nutchgora does not commit at all. It looks like trunk does not commit either. I think it's a bit confusing the COMMIT_SIZE nutch property does no solr commit but rather 'flush' data to solr. Perhaps we could clarify this a bit more. (Update the property description by mentioning the fact that it does NOT trigger a solr commit.) Agree?

          Show
          Ferdy Galema added a comment - Lewis, Do you suggest to add the commit as implemented by the fix but make it conditional? Something like this: if (getConf().getBoolean("solr.commit", true)) { solr.commit() } This makes it enabled by default. I think it is a good idea. Secondly, you say that Nutchgora does not commit at all. It looks like trunk does not commit either. I think it's a bit confusing the COMMIT_SIZE nutch property does no solr commit but rather 'flush' data to solr. Perhaps we could clarify this a bit more. (Update the property description by mentioning the fact that it does NOT trigger a solr commit.) Agree?
          Hide
          Lewis John McGibbney added a comment -

          This is exactly the viewpoint I was coming from Ferdy. I set it for 2.1 as some (maybe minor)configuration had to be done to make this a more amiable solution.

          Regarding the second half of your comment, yes it is rather confusing if I'm honest. Currently in trunk, within the write() method we send an update request, which is not a commit. Generally speaking, it appears that on a number of issues we seem to be communicating with the Solr server via different means/syntax...

          It would be nice to atleast make an attempt to make Nutchgora and trunk see eye to eye on this one, as Trunk has some nice features w.r.t Solr which, over time, it would be nice to have in both versions.

          Show
          Lewis John McGibbney added a comment - This is exactly the viewpoint I was coming from Ferdy. I set it for 2.1 as some (maybe minor)configuration had to be done to make this a more amiable solution. Regarding the second half of your comment, yes it is rather confusing if I'm honest. Currently in trunk, within the write() method we send an update request, which is not a commit. Generally speaking, it appears that on a number of issues we seem to be communicating with the Solr server via different means/syntax... It would be nice to atleast make an attempt to make Nutchgora and trunk see eye to eye on this one, as Trunk has some nice features w.r.t Solr which, over time, it would be nice to have in both versions.
          Hide
          Ferdy Galema added a comment -

          Agree with trying to make both branches to match each other.

          By the way there is a commit done after the whole job completes. (I previously thought there was no commit at all, but I was wrong). But, if this is the case, then the commit after closing a single indexwriter is not needed. (So the reason Dan is not seeing updates must have been a different problem).

          Anyway, I've uploaded patches for making this committing after the job completes configurable. (But enabled by default). Let me know if there are comments.

          Show
          Ferdy Galema added a comment - Agree with trying to make both branches to match each other. By the way there is a commit done after the whole job completes. (I previously thought there was no commit at all, but I was wrong). But, if this is the case, then the commit after closing a single indexwriter is not needed. (So the reason Dan is not seeing updates must have been a different problem). Anyway, I've uploaded patches for making this committing after the job completes configurable. (But enabled by default). Let me know if there are comments.
          Hide
          Lewis John McGibbney added a comment -

          I've just stumbled across NUTCH-1025
          There seems to be a Gora import in your trunk patch... but that won't be ready for committing for a wee while anyway :0)

          Show
          Lewis John McGibbney added a comment - I've just stumbled across NUTCH-1025 There seems to be a Gora import in your trunk patch... but that won't be ready for committing for a wee while anyway :0)
          Hide
          Ferdy Galema added a comment -

          Heh indeed that's not ready for committing yet. Weird though that my workspace did not get a compile error at first, only after refreshing the ivy deps. (Somehow it fetched a Gora library).

          Anyway I've uploaded an updated patch.

          I was not aware of NUTCH-1025. Is it ok if we incorporate that issue and rename this issue to "Add option to not commit and clarify existing solr.commit.size"?

          Show
          Ferdy Galema added a comment - Heh indeed that's not ready for committing yet. Weird though that my workspace did not get a compile error at first, only after refreshing the ivy deps. (Somehow it fetched a Gora library). Anyway I've uploaded an updated patch. I was not aware of NUTCH-1025 . Is it ok if we incorporate that issue and rename this issue to "Add option to not commit and clarify existing solr.commit.size"?
          Hide
          Ferdy Galema added a comment -

          New option added solr.commit.index

          Defaults to true: Commit after index. Will commit to trunk and nutchgora on no objection.

          Show
          Ferdy Galema added a comment - New option added solr.commit.index Defaults to true: Commit after index. Will commit to trunk and nutchgora on no objection.
          Hide
          Ferdy Galema added a comment -

          minor bug in prev. patch. uploaded v3 of trunk patch.

          Show
          Ferdy Galema added a comment - minor bug in prev. patch. uploaded v3 of trunk patch.
          Hide
          Ferdy Galema added a comment -

          Committed in trunk and nutchgora. Thanks anyone for input.

          Show
          Ferdy Galema added a comment - Committed in trunk and nutchgora. Thanks anyone for input.
          Hide
          Hudson added a comment -

          Integrated in Nutch-trunk #1892 (See https://builds.apache.org/job/Nutch-trunk/1892/)
          NUTCH-1306 Add option to not commit and clarify existing solr.commit.size (Revision 1359073)

          Result = SUCCESS
          ferdy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359073
          Files :

          • /nutch/trunk/CHANGES.txt
          • /nutch/trunk/conf/nutch-default.xml
          • /nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrConstants.java
          • /nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexer.java
          Show
          Hudson added a comment - Integrated in Nutch-trunk #1892 (See https://builds.apache.org/job/Nutch-trunk/1892/ ) NUTCH-1306 Add option to not commit and clarify existing solr.commit.size (Revision 1359073) Result = SUCCESS ferdy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359073 Files : /nutch/trunk/CHANGES.txt /nutch/trunk/conf/nutch-default.xml /nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrConstants.java /nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexer.java

            People

            • Assignee:
              Unassigned
              Reporter:
              Dan Rosher
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development