Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.5, 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      TermsComponent should be distributed

      1. SOLR-1177.patch
        18 kB
        Matt Weber
      2. TermsComponent.patch
        12 kB
        Matthew Woytowitz
      3. TermsComponent.java
        14 kB
        Matthew Woytowitz
      4. SOLR-1177.patch
        24 kB
        Matt Weber
      5. SOLR-1177.patch
        13 kB
        Shalin Shekhar Mangar
      6. SOLR-1177.patch
        26 kB
        Matt Weber
      7. SOLR-1177.patch
        13 kB
        Matt Weber
      8. SOLR-1177.patch
        14 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Matt Weber created issue -
          Hide
          Matt Weber added a comment -

          Here is my first attempt at a patch that is not currently working. For some reason only the prepare and process methods are being called. It seems that the shards parameter is not being honored like it is in the other distributed components because rb.shards is always null. I have looked at the other distributed components and did not notice them doing anything special with the shards parameter. I have based this code on the information from http://wiki.apache.org/solr/WritingDistributedSearchComponents and looking though the FacetComponent, DebugComponent, StatsComponent, and HighlightComponent code. Any help figuring out why the other methods are not being called is greatly appreciated. Please ignore the println statments, they are for debug only and will be removed in the finalized, working patch.

          Thanks!

          Show
          Matt Weber added a comment - Here is my first attempt at a patch that is not currently working. For some reason only the prepare and process methods are being called. It seems that the shards parameter is not being honored like it is in the other distributed components because rb.shards is always null. I have looked at the other distributed components and did not notice them doing anything special with the shards parameter. I have based this code on the information from http://wiki.apache.org/solr/WritingDistributedSearchComponents and looking though the FacetComponent, DebugComponent, StatsComponent, and HighlightComponent code. Any help figuring out why the other methods are not being called is greatly appreciated. Please ignore the println statments, they are for debug only and will be removed in the finalized, working patch. Thanks!
          Matt Weber made changes -
          Field Original Value New Value
          Attachment SOLR-1177.patch [ 12408597 ]
          Hide
          Matthew Woytowitz added a comment -

          I got the previous patch working. It was we close. I attached the java file and a patch for just the TermsComponent

          Show
          Matthew Woytowitz added a comment - I got the previous patch working. It was we close. I attached the java file and a patch for just the TermsComponent
          Matthew Woytowitz made changes -
          Attachment TermsComponent.patch [ 12408750 ]
          Attachment TermsComponent.java [ 12408751 ]
          Grant Ingersoll made changes -
          Assignee Grant Ingersoll [ gsingers ]
          Hide
          Matt Weber added a comment -

          I have cleaned up the patch, tested it against the latest version of trunk and wrote some unit tests. This patch invalidates SOLR-1156 because it also includes sorting support.

          Show
          Matt Weber added a comment - I have cleaned up the patch, tested it against the latest version of trunk and wrote some unit tests. This patch invalidates SOLR-1156 because it also includes sorting support.
          Matt Weber made changes -
          Attachment SOLR-1177.patch [ 12408838 ]
          Matt Weber made changes -
          Link This issue incorporates SOLR-1156 [ SOLR-1156 ]
          Hide
          Matt Weber added a comment -

          This patch includes SOLR-1156

          Show
          Matt Weber added a comment - This patch includes SOLR-1156
          Matt Weber made changes -
          Link This issue relates to SOLR-1156 [ SOLR-1156 ]
          Matt Weber made changes -
          Link This issue relates to SOLR-1156 [ SOLR-1156 ]
          Shalin Shekhar Mangar made changes -
          Assignee Grant Ingersoll [ gsingers ] Shalin Shekhar Mangar [ shalinmangar ]
          Hide
          Shalin Shekhar Mangar added a comment -
          1. Based on Matt's patch
          2. Synced to trunk
          3. Uses BaseDistributedTestCase

          All tests pass.

          I had to change TermData#frequency to an int to match the output of distributed and non-distributed cases. It is theoretically possible to have the sum of frequencies from all shards to exceed size of an int but I don't think it is practical right now. The problem is that we represent frequency as int everywhere for non-distributed responses. If we want longs in distributed search responses then we must start using longs in non-distributed responses as well to maintain compatibility.

          Matt – There is an issue open for adding SolrJ support for TermsComponent - SOLR-1139. Is it possible to replace the TermsHelper and TermData classes by classes in SOLR-1139? I'd like to have the same classes parsing responses in Solrj and distributed search.

          Show
          Shalin Shekhar Mangar added a comment - Based on Matt's patch Synced to trunk Uses BaseDistributedTestCase All tests pass. I had to change TermData#frequency to an int to match the output of distributed and non-distributed cases. It is theoretically possible to have the sum of frequencies from all shards to exceed size of an int but I don't think it is practical right now. The problem is that we represent frequency as int everywhere for non-distributed responses. If we want longs in distributed search responses then we must start using longs in non-distributed responses as well to maintain compatibility. Matt – There is an issue open for adding SolrJ support for TermsComponent - SOLR-1139 . Is it possible to replace the TermsHelper and TermData classes by classes in SOLR-1139 ? I'd like to have the same classes parsing responses in Solrj and distributed search.
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-1177.patch [ 12427722 ]
          Hide
          Yonik Seeley added a comment -

          The facet component internally uses long to add up distributed facet counts, and then uses this code:

          // use <int> tags for smaller facet counts (better back compatibility)
            private Number num(long val) {
             if (val < Integer.MAX_VALUE) return (int)val;
             else return val;
            }
          

          Yes, it's not ideal to switch from <int> to <long> in a running application, but I think it's better than failing or overflowing the int.
          Client code in SolrJ should be written to handle either via ((Number)x).longValue()

          Show
          Yonik Seeley added a comment - The facet component internally uses long to add up distributed facet counts, and then uses this code: // use < int > tags for smaller facet counts (better back compatibility) private Number num( long val) { if (val < Integer .MAX_VALUE) return ( int )val; else return val; } Yes, it's not ideal to switch from <int> to <long> in a running application, but I think it's better than failing or overflowing the int. Client code in SolrJ should be written to handle either via ((Number)x).longValue()
          Hide
          Matt Weber added a comment -

          Thanks for the update Yonik! I will see if I can get this and SOLR-1139 using the same classes.

          Show
          Matt Weber added a comment - Thanks for the update Yonik! I will see if I can get this and SOLR-1139 using the same classes.
          Hide
          Matt Weber added a comment -

          Here is an updated patch that includes Shalin's suggestions:

          • replace TermData with TermsResponse.Term
          • updates TermsHelper to use the parsing code from TermsResponse

          I also changed TermsResponse.Term#frequency to a long so that we don't overflow when calculating the frequency. Then to keep back-compatbility with existing code I do the following when writing it to the NamedList:

          if (tc.getFrequency() >= freqmin && tc.getFrequency() <= freqmax)

          { fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue()); cnt++; }

          Is this a good approach?

          This new patch includes SOLR-1139.

          Show
          Matt Weber added a comment - Here is an updated patch that includes Shalin's suggestions: replace TermData with TermsResponse.Term updates TermsHelper to use the parsing code from TermsResponse I also changed TermsResponse.Term#frequency to a long so that we don't overflow when calculating the frequency. Then to keep back-compatbility with existing code I do the following when writing it to the NamedList: if (tc.getFrequency() >= freqmin && tc.getFrequency() <= freqmax) { fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue()); cnt++; } Is this a good approach? This new patch includes SOLR-1139 .
          Matt Weber made changes -
          Attachment SOLR-1177.patch [ 12427794 ]
          Matt Weber made changes -
          Link This issue incorporates SOLR-1139 [ SOLR-1139 ]
          Hide
          Matt Weber added a comment -

          The latest version of this patch includes SOLR-1139 since we are reusing code from that patch.

          Show
          Matt Weber added a comment - The latest version of this patch includes SOLR-1139 since we are reusing code from that patch.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Matt. Can you please attach the relevant portions to SOLR-1139. We can commit SOLR-1139 first and then resolve this one.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Matt. Can you please attach the relevant portions to SOLR-1139 . We can commit SOLR-1139 first and then resolve this one.
          Hide
          Matt Weber added a comment -

          The latest SOLR-1139 patch is included inside the latest patch I attached to this ticket. Should I separate them?

          Show
          Matt Weber added a comment - The latest SOLR-1139 patch is included inside the latest patch I attached to this ticket. Should I separate them?
          Hide
          Shalin Shekhar Mangar added a comment -

          The latest SOLR-1139 patch is included inside the latest patch I attached to this ticket. Should I separate them?

          Yes. I'll commit SOLR-1139 first so remove those classes from the current patch.

          PS: I'm sorry if I am confusing you. It is 3AM here and I'm a little confused myself

          Show
          Shalin Shekhar Mangar added a comment - The latest SOLR-1139 patch is included inside the latest patch I attached to this ticket. Should I separate them? Yes. I'll commit SOLR-1139 first so remove those classes from the current patch. PS: I'm sorry if I am confusing you. It is 3AM here and I'm a little confused myself
          Hide
          Matt Weber added a comment -

          New patch that DOES NOT include the code for SOLR-1139. Make sure you have SOLR-1139 applied before using this patch.

          Show
          Matt Weber added a comment - New patch that DOES NOT include the code for SOLR-1139 . Make sure you have SOLR-1139 applied before using this patch.
          Matt Weber made changes -
          Attachment SOLR-1177.patch [ 12427833 ]
          Matt Weber made changes -
          Link This issue incorporates SOLR-1139 [ SOLR-1139 ]
          Matt Weber made changes -
          Link This issue depends on SOLR-1139 [ SOLR-1139 ]
          Hide
          Matt Weber added a comment -

          The latest version of this patch depends on SOLR-1139 being applied.

          Show
          Matt Weber added a comment - The latest version of this patch depends on SOLR-1139 being applied.
          Hide
          Shalin Shekhar Mangar added a comment -
          if (tc.getFrequency() >= freqmin && tc.getFrequency() <= freqmax) {
            fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue()); cnt++; 
          }
          

          I changed freqmin and freqmax to long and used Yonik's method to write int if possible or else switch to longs in the response.

          I'll commit this shortly.

          Show
          Shalin Shekhar Mangar added a comment - if (tc.getFrequency() >= freqmin && tc.getFrequency() <= freqmax) { fieldterms.add(tc.getTerm(), (( Number )tc.getFrequency()).intValue()); cnt++; } I changed freqmin and freqmax to long and used Yonik's method to write int if possible or else switch to longs in the response. I'll commit this shortly.
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-1177.patch [ 12427893 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 890199.

          Thanks Matt!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 890199. Thanks Matt!
          Shalin Shekhar Mangar made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Hoss Man added a comment -

          Correcting Fix Version based on CHANGES.txt, see this thread for more details...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Show
          Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
          Hoss Man made changes -
          Fix Version/s 3.1 [ 12314371 ]
          Fix Version/s 4.0 [ 12314992 ]
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Link This issue depends on SOLR-1139 [ SOLR-1139 ]
          Gavin made changes -
          Link This issue depends upon SOLR-1139 [ SOLR-1139 ]

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Matt Weber
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development