ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-860

Add alternative search-provider to ZK site

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: documentation
    • Labels:
      None

      Description

      Use search-hadoop.com service to make available search in ZK sources, MLs, wiki, etc.
      This was initially proposed on user mailing list (http://search-hadoop.com/m/sTZ4Y1BVKWg1). The search service was already added in site's skin (common for all Hadoop related projects) before (as a part of AVRO-626) so this issue is about enabling it for ZK. The ultimate goal is to use it at all Hadoop's sub-projects' sites.

        Activity

        Hide
        Alex Baranau added a comment -

        Attached patch which enables search-hadoop search service for site

        Show
        Alex Baranau added a comment - Attached patch which enables search-hadoop search service for site
        Hide
        Patrick Hunt added a comment -

        Please provide a link to the discussion thread referenced. Also link to the other Hadoop (sub)project jiras implementing this change. Thanks.

        Show
        Patrick Hunt added a comment - Please provide a link to the discussion thread referenced. Also link to the other Hadoop (sub)project jiras implementing this change. Thanks.
        Hide
        Alex Baranau added a comment -

        Updated description.

        Currently created JIRA issues for next Hadoop-related projects:

        https://issues.apache.org/jira/browse/AVRO-626 (committed)
        https://issues.apache.org/jira/browse/HBASE-2886 (committed)

        https://issues.apache.org/jira/browse/ZOOKEEPER-860
        https://issues.apache.org/jira/browse/HIVE-1611
        https://issues.apache.org/jira/browse/HDFS-1367

        I'm about to create issues also for (discussions have been already initiated):

        • Hadoop TLP
        • Common
        • Chukwa
        • MapReduce
        • Pig

        Please, let me know if more information is needed.

        Show
        Alex Baranau added a comment - Updated description. Currently created JIRA issues for next Hadoop-related projects: https://issues.apache.org/jira/browse/AVRO-626 (committed) https://issues.apache.org/jira/browse/HBASE-2886 (committed) https://issues.apache.org/jira/browse/ZOOKEEPER-860 https://issues.apache.org/jira/browse/HIVE-1611 https://issues.apache.org/jira/browse/HDFS-1367 I'm about to create issues also for (discussions have been already initiated): Hadoop TLP Common Chukwa MapReduce Pig Please, let me know if more information is needed.
        Hide
        Alex Baranau added a comment -

        Not sure that I follow why this issue was assigned to me. Is there anything I can do about it? I think I cannot commit the patch and hence resolve the issue...

        Show
        Alex Baranau added a comment - Not sure that I follow why this issue was assigned to me. Is there anything I can do about it? I think I cannot commit the patch and hence resolve the issue...
        Hide
        Mahadev konar added a comment -

        alex, the assignment just means that you are working on the patch currently. A committer will review and provide you feedback or commit if deemed fit for the project. Hope that helps.

        Show
        Mahadev konar added a comment - alex, the assignment just means that you are working on the patch currently. A committer will review and provide you feedback or commit if deemed fit for the project. Hope that helps.
        Hide
        Mahadev konar added a comment -

        marking it for 3.4 for keeping track.

        Show
        Mahadev konar added a comment - marking it for 3.4 for keeping track.
        Hide
        Otis Gospodnetic added a comment -

        Should we assign this to a committer now, since Alex is done with the patch?

        Doug Cutting reviewed and committed the big change via AVRO-626 that made it possible for this patch to be literally a 1-line change:

        Index: author/src/documentation/skinconf.xml
        ===================================================================
        — author/src/documentation/skinconf.xml (revision 678839)
        +++ author/src/documentation/skinconf.xml (revision )
        @@ -30,7 +30,7 @@
        In other words google will search the @domain for the query string.

        -->

        • <search name="ZooKeeper" domain="hadoop.apache.org" provider="google"/>
          + <search provider="search-hadoop" name="zookeeper"/>

        <!-- Disable the print link? If enabled, invalid HTML 4.0.1 -->
        <disable-print-link>true</disable-print-link>

        Look at the search box at http://avro.apache.org/ (top-right corner) to see what this patch does.

        Show
        Otis Gospodnetic added a comment - Should we assign this to a committer now, since Alex is done with the patch? Doug Cutting reviewed and committed the big change via AVRO-626 that made it possible for this patch to be literally a 1-line change: Index: author/src/documentation/skinconf.xml =================================================================== — author/src/documentation/skinconf.xml (revision 678839) +++ author/src/documentation/skinconf.xml (revision ) @@ -30,7 +30,7 @@ In other words google will search the @domain for the query string. --> <search name="ZooKeeper" domain="hadoop.apache.org" provider="google"/> + <search provider="search-hadoop" name="zookeeper"/> <!-- Disable the print link? If enabled, invalid HTML 4.0.1 --> <disable-print-link>true</disable-print-link> Look at the search box at http://avro.apache.org/ (top-right corner) to see what this patch does.
        Hide
        Patrick Hunt added a comment -

        I took a look at the search results that we would see with this change, perhaps I'm doing it wrong, but they don't compare well with the current search experience:

        http://search-hadoop.com/zookeeper?q=zookeeper
        http://search-hadoop.com/?q=zookeeper&fc_type=web+site

        in both cases the results do not compare well with the current default:
        http://www.google.com/search?sitesearch=hadoop.apache.org&q=zookeeper&Search=Search
        both from the perspective of the results themselves (ranking, dups (the first 2 results are dups of each other
        http://search-hadoop.com/?q=zookeeper&fc_type=web+site
        )) and the UI (some obvious display issues with chrome on my ubuntu machine - "fetched from index" overlaps the search facets for example, in general the SERP seems "cluttered" to me).

        Show
        Patrick Hunt added a comment - I took a look at the search results that we would see with this change, perhaps I'm doing it wrong, but they don't compare well with the current search experience: http://search-hadoop.com/zookeeper?q=zookeeper http://search-hadoop.com/?q=zookeeper&fc_type=web+site in both cases the results do not compare well with the current default: http://www.google.com/search?sitesearch=hadoop.apache.org&q=zookeeper&Search=Search both from the perspective of the results themselves (ranking, dups (the first 2 results are dups of each other http://search-hadoop.com/?q=zookeeper&fc_type=web+site )) and the UI (some obvious display issues with chrome on my ubuntu machine - "fetched from index" overlaps the search facets for example, in general the SERP seems "cluttered" to me).
        Hide
        Alex Baranau added a comment -

        Thank you for the feedback, Patrick.

        We'll improve things you mentioned soon. Will inform about it then.

        Alex Baranau.

        Show
        Alex Baranau added a comment - Thank you for the feedback, Patrick. We'll improve things you mentioned soon. Will inform about it then. Alex Baranau.
        Hide
        Otis Gospodnetic added a comment -

        Patrick,
        Concretely, we plan on:

        • Removing [+show more]
        • Removing phrases that appear before each hit
        • Not indexing xref files/pages (since we already index source code + javadocs)
        • Deduping docs with the same content but different URL

        Any other suggestions? Thanks.

        Show
        Otis Gospodnetic added a comment - Patrick, Concretely, we plan on: Removing [+show more] Removing phrases that appear before each hit Not indexing xref files/pages (since we already index source code + javadocs) Deduping docs with the same content but different URL Any other suggestions? Thanks.
        Hide
        Patrick Hunt added a comment -

        At least for the simple searches I tried I found the ranking needs to be improved. For example, compare the results of the searches I listed in my earlier comment.

        Show
        Patrick Hunt added a comment - At least for the simple searches I tried I found the ranking needs to be improved. For example, compare the results of the searches I listed in my earlier comment.
        Hide
        Otis Gospodnetic added a comment -

        Note that a better (correct?) comparison of Google's site search for ZK and search-hadoop.com (SH) should involve selecting the "web site" facet, since a non-restricted search on SH searches more than just the web site (an advantage over Google search that's limited to searching only ZK web site, no?). e.g.

        http://search-hadoop.com/?q=zookeeper+download&fc_project=Zookeeper&fc_type=web+site

        Yeah, I think once we dedupe, the results for Zookeeper will start looking better, because dupes seem to come from ZK website documents (because some documents have multiple URLs - one version with /current/ in the URL and another with a release number).

        But please note that Google's search results also include seemingly duplicate docs. e.g. http://www.google.com/search?sitesearch=hadoop.apache.org&q=zookeeper+download returns a number docs titled "ZooKeeper Administrator's Guide", but one points to Admin Guide for 3.0.0 another one for 3.1.1, and do on. They don't index pages with /current/ in the URL - maybe that is how somebody configured Google's site search?

        Show
        Otis Gospodnetic added a comment - Note that a better (correct?) comparison of Google's site search for ZK and search-hadoop.com (SH) should involve selecting the "web site" facet, since a non-restricted search on SH searches more than just the web site (an advantage over Google search that's limited to searching only ZK web site, no?). e.g. http://search-hadoop.com/?q=zookeeper+download&fc_project=Zookeeper&fc_type=web+site Yeah, I think once we dedupe, the results for Zookeeper will start looking better, because dupes seem to come from ZK website documents (because some documents have multiple URLs - one version with /current/ in the URL and another with a release number). But please note that Google's search results also include seemingly duplicate docs. e.g. http://www.google.com/search?sitesearch=hadoop.apache.org&q=zookeeper+download returns a number docs titled "ZooKeeper Administrator's Guide", but one points to Admin Guide for 3.0.0 another one for 3.1.1, and do on. They don't index pages with /current/ in the URL - maybe that is how somebody configured Google's site search?
        Hide
        Patrick Hunt added a comment - - edited

        Note that a better (correct?) comparison of Google's site search for ZK and search-hadoop.com (SH) should involve selecting the "web site" facet

        I did notice that, however I was comparing the "default" behavior in both cases - typing some text into the search box and hitting return.

        They don't index pages with /current/ in the URL - maybe that is how somebody configured Google's site search?

        We (zk) certainly didn't do anything here, however note that "current" is actually a symbolic link to a particular version of our docs. Perhaps they're able to ferret this out somehow?

        Show
        Patrick Hunt added a comment - - edited Note that a better (correct?) comparison of Google's site search for ZK and search-hadoop.com (SH) should involve selecting the "web site" facet I did notice that, however I was comparing the "default" behavior in both cases - typing some text into the search box and hitting return. They don't index pages with /current/ in the URL - maybe that is how somebody configured Google's site search? We (zk) certainly didn't do anything here, however note that "current" is actually a symbolic link to a particular version of our docs. Perhaps they're able to ferret this out somehow?
        Hide
        Otis Gospodnetic added a comment -

        I don't know how they avoid indexing /current/ URLs on ZK's site, but we can certainly add /current/ to the list of URL patterns to skip.

        Note that it may be good for ZK to make use of Canonical URL spec. Then anyone could automatically and easily figure out these dupes without having to resort to URL pattern rules or text content comparison. Here's an example: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

        Show
        Otis Gospodnetic added a comment - I don't know how they avoid indexing /current/ URLs on ZK's site, but we can certainly add /current/ to the list of URL patterns to skip. Note that it may be good for ZK to make use of Canonical URL spec. Then anyone could automatically and easily figure out these dupes without having to resort to URL pattern rules or text content comparison. Here's an example: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
        Hide
        Otis Gospodnetic added a comment -
        • Removing [+show more]
        • Removing phrases that appear before each hit
        • Not indexing xref files/pages (since we already index source code + javadocs)
        • Deduping docs with the same content but different URL

        We've done the above, plus changed how relevance is computed.

        Is this good for a commit now?

        Show
        Otis Gospodnetic added a comment - Removing [+show more] Removing phrases that appear before each hit Not indexing xref files/pages (since we already index source code + javadocs) Deduping docs with the same content but different URL We've done the above, plus changed how relevance is computed. Is this good for a commit now?
        Hide
        Otis Gospodnetic added a comment -

        For what it's worth, the following issues for the same functionality on other sub-projects have been committed so far: PIG-1661 HBASE-2886 HDFS-1367

        Show
        Otis Gospodnetic added a comment - For what it's worth, the following issues for the same functionality on other sub-projects have been committed so far: PIG-1661 HBASE-2886 HDFS-1367
        Hide
        Patrick Hunt added a comment -

        Hi Alex, Otis. Take a look at ZOOKEEPER-925. I think this is a good time (new site gen and new site once/if we get approved as TLP) to introduce this change. Perhaps you could update the sitegen to include this? It would give ppl a change to try it out. Regards.

        Show
        Patrick Hunt added a comment - Hi Alex, Otis. Take a look at ZOOKEEPER-925 . I think this is a good time (new site gen and new site once/if we get approved as TLP) to introduce this change. Perhaps you could update the sitegen to include this? It would give ppl a change to try it out. Regards.
        Hide
        Alex Baranau added a comment -

        Hi Patrick, thanks for the notification!

        The work (switching to mvn site) is still in it's early stage. I'll collaborate with guys to make sure we don't loose search service integration.

        Show
        Alex Baranau added a comment - Hi Patrick, thanks for the notification! The work (switching to mvn site) is still in it's early stage. I'll collaborate with guys to make sure we don't loose search service integration.
        Hide
        Patrick Hunt added a comment -

        I've added this as the default search on our new TLP homepage http://zookeeper.apache.org/

        Please do take a look and LMK if you have any suggestions.

        Show
        Patrick Hunt added a comment - I've added this as the default search on our new TLP homepage http://zookeeper.apache.org/ Please do take a look and LMK if you have any suggestions.
        Hide
        Otis Gospodnetic added a comment -

        Thanks Patrick!
        The only comment I have is that when you click on the search box, the default text in that search box should disappear, so the person can just start typing their keywords.

        Show
        Otis Gospodnetic added a comment - Thanks Patrick! The only comment I have is that when you click on the search box, the default text in that search box should disappear, so the person can just start typing their keywords.
        Hide
        Patrick Hunt added a comment -

        Sounds good to me, can you provide a patch that does this? Or code snippet that does. Thanks.

        Show
        Patrick Hunt added a comment - Sounds good to me, can you provide a patch that does this? Or code snippet that does. Thanks.
        Hide
        Mahadev konar added a comment -

        not a blocker. Moving it out of 3.4 release.

        Show
        Mahadev konar added a comment - not a blocker. Moving it out of 3.4 release.
        Hide
        Otis Gospodnetic added a comment -

        This should be marked as Resolved.

        Before that it would be great if somebody could make just one minor fix - when you click in the search box that "Search with Apache Solr" default text that's in it does NOT disappear as it should.
        The simplest fix is to just REMOVE that text.

        Show
        Otis Gospodnetic added a comment - This should be marked as Resolved. Before that it would be great if somebody could make just one minor fix - when you click in the search box that "Search with Apache Solr" default text that's in it does NOT disappear as it should. The simplest fix is to just REMOVE that text.
        Hide
        Michi Mutsuzaki added a comment -

        How do I remove the text? I couldn't find the text in the ZooKeeper codebase.

        Show
        Michi Mutsuzaki added a comment - How do I remove the text? I couldn't find the text in the ZooKeeper codebase.

          People

          • Assignee:
            Alex Baranau
            Reporter:
            Alex Baranau
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development