Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 1.4
    • Fix Version/s: 4.4
    • Component/s: None
    • Labels:
      None

      Description

      We'll integrate Katta into Solr so that:

      • Distributed search uses Hadoop RPC
      • Shard/SolrCore distribution and management
      • Zookeeper based failover
      • Indexes may be built using Hadoop
      1. solr-1395-katta-0.6.3-7.patch
        113 kB
        tom liu
      2. solr-1395-katta-0.6.3-6.patch
        113 kB
        tom liu
      3. solr-1395-katta-0.6.3-5.patch
        108 kB
        tom liu
      4. solr-1395-katta-0.6.3-4.patch
        112 kB
        tom liu
      5. solr1395.jpg
        226 kB
        JohnWu
      6. katta-solrcores.jpg
        94 kB
        tom liu
      7. solr-1395-katta-0.6.2-3.patch
        108 kB
        tom liu
      8. solr-1395-katta-0.6.2-2.patch
        99 kB
        tom liu
      9. solr-1395-katta-0.6.2-1.patch
        99 kB
        tom liu
      10. solr-1395-katta-0.6.2.patch
        89 kB
        tom liu
      11. back-end.log
        12 kB
        Mathias Walter
      12. front-end.log
        25 kB
        Mathias Walter
      13. solr-1395-1431-katta0.6.patch
        251 kB
        Thomas Koch
      14. solr-1395-1431-katta0.6.patch
        253 kB
        Thomas Koch
      15. solr-1395-1431-4.patch
        258 kB
        Jason Venner (www.prohadoop.com)
      16. katta.zk.properties
        1 kB
        Jason Venner (www.prohadoop.com)
      17. katta.node.properties
        0.2 kB
        Jason Venner (www.prohadoop.com)
      18. solr-1395-1431-3.patch
        255 kB
        Jason Venner (www.prohadoop.com)
      19. solr-1395-1431.patch
        263 kB
        Jason Venner (www.prohadoop.com)
      20. SOLR-1395.patch
        221 kB
        Jason Rutherglen
      21. zkclient-0.1-dev.jar
        54 kB
        Jason Rutherglen
      22. katta-core-0.6-dev.jar
        242 kB
        Jason Rutherglen
      23. test-katta-core-0.6-dev.jar
        162 kB
        Jason Rutherglen
      24. SOLR-1395.patch
        222 kB
        Jason Rutherglen
      25. hadoop-core-0.19.0.jar
        2.26 MB
        Jason Rutherglen
      26. log4j-1.2.13.jar
        350 kB
        Jason Rutherglen
      27. zookeeper-3.2.1.jar
        892 kB
        Jason Rutherglen
      28. SOLR-1395.patch
        252 kB
        Jason Rutherglen

        Issue Links

          Activity

          Hide
          Steve Rowe added a comment -

          Bulk close resolved 4.4 issues

          Show
          Steve Rowe added a comment - Bulk close resolved 4.4 issues
          Hide
          Otis Gospodnetic added a comment -

          No response to my Q from N months ago. With Cloudera Search and Blur being available, plus SolrCloud, I think we can't close this as Won't Fix.

          Show
          Otis Gospodnetic added a comment - No response to my Q from N months ago. With Cloudera Search and Blur being available, plus SolrCloud, I think we can't close this as Won't Fix.
          Hide
          Otis Gospodnetic added a comment -

          Does anyone really need this? If so, I'm curious why?
          Or should we close this?

          Show
          Otis Gospodnetic added a comment - Does anyone really need this? If so, I'm curious why? Or should we close this?
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          JohnWu added a comment -

          tom:

          can you merge the code of my comment - 31/Oct/11 08:2 in yours, and supply the multi-core function for the sub-proxy?
          so if the proxy and sub-proxy contains many schemas and support muli-core query, and all the schema are independent, we can add different query and business logic, and the hot deployment will give the solr real features which is katta can not be achieved.
          multi-core need change the code of your patch
          public SolrKattaServer(String defaultCoreName, CoreContainer coreContainer) {
          this.coreContainer = coreContainer;
          handler = new MultiEmbeddedSearchHandler(coreContainer);
          defaultCore = coreContainer.getCore(defaultCoreName);

          that's only load one solr conf folder in, so it only load one solr core
          Thanks!

          JohnWu

          Show
          JohnWu added a comment - tom: can you merge the code of my comment - 31/Oct/11 08:2 in yours, and supply the multi-core function for the sub-proxy? so if the proxy and sub-proxy contains many schemas and support muli-core query, and all the schema are independent, we can add different query and business logic, and the hot deployment will give the solr real features which is katta can not be achieved. multi-core need change the code of your patch public SolrKattaServer(String defaultCoreName, CoreContainer coreContainer) { this.coreContainer = coreContainer; handler = new MultiEmbeddedSearchHandler(coreContainer); defaultCore = coreContainer.getCore(defaultCoreName); that's only load one solr conf folder in, so it only load one solr core Thanks! JohnWu
          Hide
          tom liu added a comment -

          fix the bug that SolrCore do not closed.

          Show
          tom liu added a comment - fix the bug that SolrCore do not closed.
          Hide
          tom liu added a comment -

          I find one bug in shardSize of SolrKattaServer.java:
          when use coreContainer.getCore(name), the core's refcount has added by one. so that, core.close do not close the core.

          Show
          tom liu added a comment - I find one bug in shardSize of SolrKattaServer.java: when use coreContainer.getCore(name), the core's refcount has added by one. so that, core.close do not close the core.
          Hide
          tom liu added a comment -

          Based on current trunk version, and impove some features:

          1. create one class that reopen IndexWriter for override solr indexes
          2. create one component class that only for udpate/reopenIndexWriter, so we do not use all other components of query
          3. merger other result of shard

          sample config:

          1. top-level solr
              <searchComponent name="kattaUpdate" class="solr.KattaUpdateTransferComponent" />
              <requestHandler name="update" class="solr.KattaRequestHandler" >
                <lst name="defaults">
                  <str name="shards">*</str>
                  <str name="shards.qt">update</str>
                </lst>
                <arr name="components">
                  <str>kattaUpdate</str>
                </arr>
              </requestHandler>
              <requestHandler name="kattaUpdate" class="solr.KattaRequestHandler" >
                <lst name="defaults">
                  <str name="shards">*</str>
                  <str name="shards.qt">kattaUpdate</str>
                </lst>
                <arr name="components">
                  <str>kattaUpdate</str>
                </arr>
              </requestHandler>
            
          2. middle-level solrconfig
              <searchComponent name="kattaUpdate" class="solr.KattaUpdateTransferComponent" />
              <requestHandler name="update" class="solr.MultiEmbeddedSearchHandler">
                <arr name="components">
                  <str>kattaUpdate</str>
                </arr>
              </requestHandler>
              <requestHandler name="kattaUpdate" class="solr.MultiEmbeddedSearchHandler">
                <arr name="components">
                  <str>kattaUpdate</str>
                </arr>
              </requestHandler>
            
          3. lower-level solrconfig
              <requestHandler name="update" class="solr.XmlUpdateRequestHandler" >
                <arr name="components" />
              </requestHandler>
              <requestHandler name="kattaUpdate" class="solr.KattaUpdateHandler" >
                <arr name="components" />
              </requestHandler>
            
          Show
          tom liu added a comment - Based on current trunk version, and impove some features: create one class that reopen IndexWriter for override solr indexes create one component class that only for udpate/reopenIndexWriter, so we do not use all other components of query merger other result of shard sample config: top-level solr <searchComponent name= "kattaUpdate" class= "solr.KattaUpdateTransferComponent" /> <requestHandler name= "update" class= "solr.KattaRequestHandler" > <lst name= "defaults" > <str name= "shards" > * </str> <str name= "shards.qt" > update </str> </lst> <arr name= "components" > <str> kattaUpdate </str> </arr> </requestHandler> <requestHandler name= "kattaUpdate" class= "solr.KattaRequestHandler" > <lst name= "defaults" > <str name= "shards" > * </str> <str name= "shards.qt" > kattaUpdate </str> </lst> <arr name= "components" > <str> kattaUpdate </str> </arr> </requestHandler> middle-level solrconfig <searchComponent name= "kattaUpdate" class= "solr.KattaUpdateTransferComponent" /> <requestHandler name= "update" class= "solr.MultiEmbeddedSearchHandler" > <arr name= "components" > <str> kattaUpdate </str> </arr> </requestHandler> <requestHandler name= "kattaUpdate" class= "solr.MultiEmbeddedSearchHandler" > <arr name= "components" > <str> kattaUpdate </str> </arr> </requestHandler> lower-level solrconfig <requestHandler name= "update" class= "solr.XmlUpdateRequestHandler" > <arr name= "components" /> </requestHandler> <requestHandler name= "kattaUpdate" class= "solr.KattaUpdateHandler" > <arr name= "components" /> </requestHandler>
          Hide
          tom liu added a comment -

          JohnWu:
          you can add order of score.

          Show
          tom liu added a comment - JohnWu: you can add order of score.
          Hide
          JohnWu added a comment -

          Tom:
          now I need confirm one thing about the global score in 1395 patch, we know the first query carry the response is <id,score>, so now if a big document set create 3 part index and distribute in 3 shards, the score is only a part score in shard?
          we can not get the total score of 3 shard like the katta first time get the docf, second time get the total score?
          Thanks!
          JohnWu

          Show
          JohnWu added a comment - Tom: now I need confirm one thing about the global score in 1395 patch, we know the first query carry the response is <id,score>, so now if a big document set create 3 part index and distribute in 3 shards, the score is only a part score in shard? we can not get the total score of 3 shard like the katta first time get the docf, second time get the total score? Thanks! JohnWu
          Hide
          JohnWu added a comment -

          for the formats error, we re submit the code of DeployableSolrKattaServer

          public DeployableSolrKattaServer() throws ParserConfigurationException,
          IOException, SAXException

          { // super(getServerName(), new CoreContainer(getSolrHome() // .getAbsolutePath(), getConfigFile())); //super(getServerName(), new CoreContainer()); super(getServerName(),new CoreContainer.Initializer().initialize()); }
          Show
          JohnWu added a comment - for the formats error, we re submit the code of DeployableSolrKattaServer public DeployableSolrKattaServer() throws ParserConfigurationException, IOException, SAXException { // super(getServerName(), new CoreContainer(getSolrHome() // .getAbsolutePath(), getConfigFile())); //super(getServerName(), new CoreContainer()); super(getServerName(),new CoreContainer.Initializer().initialize()); }
          Hide
          JohnWu added a comment - - edited

          tom:
          For multi-schema case, I try to add multi-core in proxy and sub-proxy, but I meet the connection close issue.
          please help me and analyze the case.

          now in proxy, I set the solr home with proxymulticore folder, which contains structure as follows:
          -proxymulticore
          --customer
          ----conf
          ----data
          --part
          ----conf
          ----data
          --solr.xml

          solr.xml as follows:
          <cores adminPath="/admin/cores">
          <core name="part" instanceDir="part">
          </core>
          <core name="customer" instanceDir="customer">
          </core>
          </cores>

          in part folder the solrcong.xml set as
          ----------------
          <requestHandler name="standard" class="solr.KattaRequestHandler" default="true">
          <lst name="defaults">
          <str name="echoParams">explicit</str>
          <str name="shards">part-00000,part-00001,part-00002</str>
          </lst>
          </requestHandler>
          ----------------

          so in the proxy, we can use the request:
          http://localhost:8080/solr-1395-katta-0.6.2-2patch/part/select/?q=a*&version=2.2&start=0&rows=10&indent=on&isShard=false&distrib=true&core=part

          the proxy will use the kattaRequest dispatch the query to katta datanodes(subproxy).

          in subproxy, when we start the embedded solr with subproxymulticore folder, which structure as follows:
          -subproxymulticore
          --customer
          ----conf
          ----data
          --part
          ----conf
          ----data
          --solr.xml

          exlusive the solrconfig.xml in part folder as follows, the others are same.
          ----------------
          <requestHandler name="standard" class="solr.SearchHandler" default="true">
          <!-- default values for query parameters -->
          <lst name="defaults">
          <str name="echoParams">explicit</str>
          </lst>
          </requestHandler>
          ------------------
          now, I correct some code of DeployableSolrKattaServer.java in patched solr as follows:
          --------------------
          public DeployableSolrKattaServer() throws ParserConfigurationException,
          IOException, SAXException

          { // super(getServerName(), new CoreContainer(getSolrHome() // .getAbsolutePath(), getConfigFile())); //super(getServerName(), new CoreContainer()); //By JohnWu, we do not direct find the conf folder of a core, we find the solr.xml to add cores. super(getServerName(),new CoreContainer.Initializer().initialize()); }

          -------------------

          add correct some code in SolrKattaServer.java

          -------------------
          public SolrKattaServer(String defaultCoreName, CoreContainer coreContainer) {
          this.coreContainer = coreContainer;
          handler = new MultiEmbeddedSearchHandler(coreContainer);
          handler.init(new NamedList());
          //defaultCore = coreContainer.getCore(defaultCoreName);
          defaultCores = coreContainer.getCores();
          // if (defaultCore == null)
          // throw new SolrException(ErrorCode.UNKNOWN, "defaultCore:"
          // + defaultCoreName + " could not be found");
          if (defaultCores == null)
          throw new SolrException(ErrorCode.UNKNOWN, "defaultCore:"
          + defaultCoreName + " could not be found");
          //JohnWu add for multi-cores
          Iterator it = defaultCores.iterator();
          while(it.hasNext())

          { handler.inform((SolrCore)it.next()); }

          //handler.inform(defaultCore);
          }

          /**

          • The main method that executes requests from a KattaClient
            */
            @Override
            public KattaResponse request(String[] shards, KattaRequest request)
            throws Exception {

          //JohnWu add it for multi cores, we need get the suitable core according to the shard
          SolrParams params = request.getParams();
          SolrParams required = params.required();
          String cname = required.get(CoreAdminParams.CORE);
          SolrCore core = coreContainer.getCore(cname);

          //need add some code to void the socre is null
          if (core != null) {
          ModifiableSolrParams sp = new ModifiableSolrParams(request.getParams());
          String shardsStr = StringUtils.arrayToString(shards);
          sp.set(ShardParams.SHARDS, shardsStr);

          if (log.isDebugEnabled())

          { log.debug("SolrServer.request: " + nodeName + " shards:" + Arrays.asList(shards) + " request params:" + sp ); }

          //remove by John
          //SolrQueryRequestBase req = new LocalSolrQueryRequest(defaultCore, sp);

          SolrQueryRequestBase req = new LocalSolrQueryRequest(core, sp);
          SolrQueryResponse resp = new SolrQueryResponse();
          // Added by tom liu
          // because exception would stop RPC
          // so, must handle exception
          try

          { // add end getRequestHandler(req).handleRequest(req, resp); // Added by tom liu }

          catch(SolrException ex)

          { log.error(ex.getMessage(), ex); }

          // add end
          NamedList nl = resp.getValues();
          nl.add("QueriedShards", shards);
          // Added by tom liu
          SolrDocumentList sdl = (SolrDocumentList)nl.get("response");
          if( sdl == null )

          { nl.add("response", new SolrDocumentList()); if( log.isWarnEnabled() ) log.warn("SolrServer.SolrResponse: no response"); }

          // add end
          SolrResponse rsp = new SolrResponseBase();
          rsp.setResponse(nl);
          if (log.isDebugEnabled()){
          if( null != sdl )

          { log.debug("SolrServer.SolrResponse: numFound=" + sdl.getNumFound() + ",start=" + sdl.getStart() + ",docs=" + sdl.size()); }

          log.debug("termVectors=" + nl.get("termVectors"));
          }
          // By using shards[0] we guarantee that this response is tied to a known
          // shard in the orignator, so that the results can be merged.
          // The name and only 1 is allowed has to be one of the original query
          // shards.

          return new KattaResponse(shards[0], "", 0, rsp);
          }else

          { //maybe null is bad! System.out.println("------the core is null!!!!!!"); return null; }

          }

          // Added by tom liu
          // for supporting qt=...
          private MultiEmbeddedSearchHandler getRequestHandler(SolrQueryRequest request) {
          SolrParams params = request.getParams();
          if( params == null )

          { params = new ModifiableSolrParams(); }

          String qt = params.get( CommonParams.QT );
          if (qt != null) {
          //JohnWu remove follow for multi-core
          // MultiEmbeddedSearchHandler myhandler = (MultiEmbeddedSearchHandler)defaultCore.getRequestHandler( qt );
          // if( myhandler == null )

          { // throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "unknown handler: "+qt); // }

          // myhandler.setCoreContainer(coreContainer);
          // return myhandler;

          //JohnWu add for multi-core

          Iterator it = defaultCores.iterator();
          while(it.hasNext()){
          MultiEmbeddedSearchHandler myhandler = (MultiEmbeddedSearchHandler)((SolrCore)it.next()).getRequestHandler( qt );
          if( myhandler == null )

          { throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "unknown handler: "+qt); }

          myhandler.setCoreContainer(coreContainer);
          return myhandler;
          }

          }
          return handler;
          }

          ------------------

          but now when the katta send the second query with ids to data node, the connection is closed.

          INFO: pSearch-00000#pSearch-00000 webapp=null path=/select params=

          {start=0&ids=28aaa%2Caaa%2Cbbb&q=a*&core=part&isShard=true&rows=10}

          hits=3 status=0 QTime=12

          please help me, and tell me why the process can not go through the whole process.

          Thanks!

          JohnWu

          Show
          JohnWu added a comment - - edited tom: For multi-schema case, I try to add multi-core in proxy and sub-proxy, but I meet the connection close issue. please help me and analyze the case. now in proxy, I set the solr home with proxymulticore folder, which contains structure as follows: -proxymulticore --customer ----conf ----data --part ----conf ----data --solr.xml solr.xml as follows: <cores adminPath="/admin/cores"> <core name="part" instanceDir="part"> </core> <core name="customer" instanceDir="customer"> </core> </cores> in part folder the solrcong.xml set as ---------------- <requestHandler name="standard" class="solr.KattaRequestHandler" default="true"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="shards">part-00000,part-00001,part-00002</str> </lst> </requestHandler> ---------------- so in the proxy, we can use the request: http://localhost:8080/solr-1395-katta-0.6.2-2patch/part/select/?q=a*&version=2.2&start=0&rows=10&indent=on&isShard=false&distrib=true&core=part the proxy will use the kattaRequest dispatch the query to katta datanodes(subproxy). in subproxy, when we start the embedded solr with subproxymulticore folder, which structure as follows: -subproxymulticore --customer ----conf ----data --part ----conf ----data --solr.xml exlusive the solrconfig.xml in part folder as follows, the others are same. ---------------- <requestHandler name="standard" class="solr.SearchHandler" default="true"> <!-- default values for query parameters --> <lst name="defaults"> <str name="echoParams">explicit</str> </lst> </requestHandler> ------------------ now, I correct some code of DeployableSolrKattaServer.java in patched solr as follows: -------------------- public DeployableSolrKattaServer() throws ParserConfigurationException, IOException, SAXException { // super(getServerName(), new CoreContainer(getSolrHome() // .getAbsolutePath(), getConfigFile())); //super(getServerName(), new CoreContainer()); //By JohnWu, we do not direct find the conf folder of a core, we find the solr.xml to add cores. super(getServerName(),new CoreContainer.Initializer().initialize()); } ------------------- add correct some code in SolrKattaServer.java ------------------- public SolrKattaServer(String defaultCoreName, CoreContainer coreContainer) { this.coreContainer = coreContainer; handler = new MultiEmbeddedSearchHandler(coreContainer); handler.init(new NamedList()); //defaultCore = coreContainer.getCore(defaultCoreName); defaultCores = coreContainer.getCores(); // if (defaultCore == null) // throw new SolrException(ErrorCode.UNKNOWN, "defaultCore:" // + defaultCoreName + " could not be found"); if (defaultCores == null) throw new SolrException(ErrorCode.UNKNOWN, "defaultCore:" + defaultCoreName + " could not be found"); //JohnWu add for multi-cores Iterator it = defaultCores.iterator(); while(it.hasNext()) { handler.inform((SolrCore)it.next()); } //handler.inform(defaultCore); } /** The main method that executes requests from a KattaClient */ @Override public KattaResponse request(String[] shards, KattaRequest request) throws Exception { //JohnWu add it for multi cores, we need get the suitable core according to the shard SolrParams params = request.getParams(); SolrParams required = params.required(); String cname = required.get(CoreAdminParams.CORE); SolrCore core = coreContainer.getCore(cname); //need add some code to void the socre is null if (core != null) { ModifiableSolrParams sp = new ModifiableSolrParams(request.getParams()); String shardsStr = StringUtils.arrayToString(shards); sp.set(ShardParams.SHARDS, shardsStr); if (log.isDebugEnabled()) { log.debug("SolrServer.request: " + nodeName + " shards:" + Arrays.asList(shards) + " request params:" + sp ); } //remove by John //SolrQueryRequestBase req = new LocalSolrQueryRequest(defaultCore, sp); SolrQueryRequestBase req = new LocalSolrQueryRequest(core, sp); SolrQueryResponse resp = new SolrQueryResponse(); // Added by tom liu // because exception would stop RPC // so, must handle exception try { // add end getRequestHandler(req).handleRequest(req, resp); // Added by tom liu } catch(SolrException ex) { log.error(ex.getMessage(), ex); } // add end NamedList nl = resp.getValues(); nl.add("QueriedShards", shards); // Added by tom liu SolrDocumentList sdl = (SolrDocumentList)nl.get("response"); if( sdl == null ) { nl.add("response", new SolrDocumentList()); if( log.isWarnEnabled() ) log.warn("SolrServer.SolrResponse: no response"); } // add end SolrResponse rsp = new SolrResponseBase(); rsp.setResponse(nl); if (log.isDebugEnabled()){ if( null != sdl ) { log.debug("SolrServer.SolrResponse: numFound=" + sdl.getNumFound() + ",start=" + sdl.getStart() + ",docs=" + sdl.size()); } log.debug("termVectors=" + nl.get("termVectors")); } // By using shards [0] we guarantee that this response is tied to a known // shard in the orignator, so that the results can be merged. // The name and only 1 is allowed has to be one of the original query // shards. return new KattaResponse(shards [0] , "", 0, rsp); }else { //maybe null is bad! System.out.println("------the core is null!!!!!!"); return null; } } // Added by tom liu // for supporting qt=... private MultiEmbeddedSearchHandler getRequestHandler(SolrQueryRequest request) { SolrParams params = request.getParams(); if( params == null ) { params = new ModifiableSolrParams(); } String qt = params.get( CommonParams.QT ); if (qt != null) { //JohnWu remove follow for multi-core // MultiEmbeddedSearchHandler myhandler = (MultiEmbeddedSearchHandler)defaultCore.getRequestHandler( qt ); // if( myhandler == null ) { // throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "unknown handler: "+qt); // } // myhandler.setCoreContainer(coreContainer); // return myhandler; //JohnWu add for multi-core Iterator it = defaultCores.iterator(); while(it.hasNext()){ MultiEmbeddedSearchHandler myhandler = (MultiEmbeddedSearchHandler)((SolrCore)it.next()).getRequestHandler( qt ); if( myhandler == null ) { throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "unknown handler: "+qt); } myhandler.setCoreContainer(coreContainer); return myhandler; } } return handler; } ------------------ but now when the katta send the second query with ids to data node, the connection is closed. INFO: pSearch-00000#pSearch-00000 webapp=null path=/select params= {start=0&ids=28aaa%2Caaa%2Cbbb&q=a*&core=part&isShard=true&rows=10} hits=3 status=0 QTime=12 please help me, and tell me why the process can not go through the whole process. Thanks! JohnWu
          Hide
          tom liu added a comment -

          fix bug on grouping
          and do not support update on cores

          Show
          tom liu added a comment - fix bug on grouping and do not support update on cores
          Hide
          tom liu added a comment -

          JohnWu:
          i use the solr's regex rule to point which core would be use.
          First, in sub-proxy and top-solr, i merge two schema into one schema, eg. merge book and address schemas into bookaddress, then put bookaddress schema into top-solr and sub-proxy, so that solr would find every field in them.
          Then, on top-solr machine, i install one nginx proxy, it will act on a cache and proxy.
          Last, setup nginx, let's direct to correct cores. for example:
          1. in solr parameters, i add one such as: target=book[ or address]
          2. in nginx, i setup:

              if ($args ~* ^(.*)target%3a(\\w*)(.*)$){
                    set $shard $2;
                    rewrite (.*) $1?shards=$shard\\\\w* break;
              }
              if ($args ~* ^(.*)target:(\\w*)(.*)$){
                    set $shard $2;
                    rewrite (.*) $1?shards=$shard\\\\w* break;
              }
          

          3. then, the reqest url would be changed to URL&shards=book\w*

          Show
          tom liu added a comment - JohnWu: i use the solr's regex rule to point which core would be use. First, in sub-proxy and top-solr, i merge two schema into one schema, eg. merge book and address schemas into bookaddress, then put bookaddress schema into top-solr and sub-proxy, so that solr would find every field in them. Then, on top-solr machine, i install one nginx proxy, it will act on a cache and proxy. Last, setup nginx, let's direct to correct cores. for example: 1. in solr parameters, i add one such as: target=book[ or address] 2. in nginx, i setup: if ($args ~* ^(.*)target%3a(\\w*)(.*)$){ set $shard $2; rewrite (.*) $1?shards=$shard\\\\w* break; } if ($args ~* ^(.*)target:(\\w*)(.*)$){ set $shard $2; rewrite (.*) $1?shards=$shard\\\\w* break; } 3. then, the reqest url would be changed to URL&shards=book\w*
          Hide
          JohnWu added a comment -

          Tom:
          does the subproxy support multi-core?
          I have different schema index, like "book" and "address"
          now I create 2 cores name them "book" and "address"
          I can dispatch the query to the sub-proxy, but only the proxy with book schema can facet the result,

          so can you tell me how to setup multi-core in subproxy,which is a embedded solr server. I trace the code, it find the subproxy conf folder means the subproxy is a single core server.

          Thanks!
          or anther solution for me, the subproxy keep a minimized schema, the schema can fit for every katta node?

          JohnWu

          Show
          JohnWu added a comment - Tom: does the subproxy support multi-core? I have different schema index, like "book" and "address" now I create 2 cores name them "book" and "address" I can dispatch the query to the sub-proxy, but only the proxy with book schema can facet the result, so can you tell me how to setup multi-core in subproxy,which is a embedded solr server. I trace the code, it find the subproxy conf folder means the subproxy is a single core server. Thanks! or anther solution for me, the subproxy keep a minimized schema, the schema can fit for every katta node? JohnWu
          Hide
          JohnWu added a comment -

          Tom:
          yes, you add modifyRequest method in
          1)DebugComponent
          2)FacetComponent
          3)HighlightComponent
          4)SearchComponent
          5)SpellCheckComponent
          6)TermVectorComponent

          they all called by "component.modifyRequest(this, me, sreq)" of the ResponseBuilder, so above components can work.

          Thanks!

          JohnWu

          Show
          JohnWu added a comment - Tom: yes, you add modifyRequest method in 1)DebugComponent 2)FacetComponent 3)HighlightComponent 4)SearchComponent 5)SpellCheckComponent 6)TermVectorComponent they all called by "component.modifyRequest(this, me, sreq)" of the ResponseBuilder, so above components can work. Thanks! JohnWu
          Hide
          laigood added a comment - - edited

          Hello everyone,I'm very interested in integrate solr with katta,But i have some question.
          1.Is it only affect in solr1.4? Does anyone successful integrate with solr3.X?
          2.Efficiency,after 2times proxy,will it slow down query times,How slow is it?
          3.figture solr1395.jpg show that clusters have a master,is this mean if i index
          or search throught solr, must throught this master? how to make loadbalance?
          thanks

          Show
          laigood added a comment - - edited Hello everyone,I'm very interested in integrate solr with katta,But i have some question. 1.Is it only affect in solr1.4? Does anyone successful integrate with solr3.X? 2.Efficiency,after 2times proxy,will it slow down query times,How slow is it? 3.figture solr1395.jpg show that clusters have a master,is this mean if i index or search throught solr, must throught this master? how to make loadbalance? thanks
          Hide
          JohnWu added a comment -

          Tom:
          1) yes, the code is update the index with segment style, that's great,but it can not sync index when one node crash down. So I create index with renew the whole index.

          2) still for the facet issue, can you give me a case about facet? tell me the detail steps, someone told me the facet data is stored in the front end, so it can not get the global info for our distributed system.

          thanks!
          JohnWu

          Show
          JohnWu added a comment - Tom: 1) yes, the code is update the index with segment style, that's great,but it can not sync index when one node crash down. So I create index with renew the whole index. 2) still for the facet issue, can you give me a case about facet? tell me the detail steps, someone told me the facet data is stored in the front end, so it can not get the global info for our distributed system. thanks! JohnWu
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          tom liu added a comment -

          Johnwu:
          surely, i do test the patch with:
          stats,terms,termvector,hl,facet,debug

          PS: i changed katta code for managing shards of updating.
          iff you do not need this, pls comment the following code including client.broadcastToNodes(...) :

          KattaClient.java
          	public ClientResult<KattaResponse> request(long timeout,
          			String[] indexNames, KattaRequest request) throws KattaException {
          		ClientResult<KattaResponse> results = null;
          		String path = request.getParams().get(CommonParams.QT);
          		if (path!=null && path.equals("update")) {
          			// only for qt=update
          			results = client.broadcastToNodes(
          					timeout, true, REQUEST_METHOD, 0, indexNames, null, request);
          		} else {
          			results = client.broadcastToIndices(
          				    timeout, true, REQUEST_METHOD, 0, indexNames, null, request);
          		}
          		return results;
          	}
          
          Show
          tom liu added a comment - Johnwu: surely, i do test the patch with: stats,terms,termvector,hl,facet,debug PS: i changed katta code for managing shards of updating. iff you do not need this, pls comment the following code including client.broadcastToNodes(...) : KattaClient.java public ClientResult<KattaResponse> request( long timeout, String [] indexNames, KattaRequest request) throws KattaException { ClientResult<KattaResponse> results = null ; String path = request.getParams().get(CommonParams.QT); if (path!= null && path.equals( "update" )) { // only for qt=update results = client.broadcastToNodes( timeout, true , REQUEST_METHOD, 0, indexNames, null , request); } else { results = client.broadcastToIndices( timeout, true , REQUEST_METHOD, 0, indexNames, null , request); } return results; }
          Hide
          JohnWu added a comment -

          Tom:
          Thanks!
          do you test the patch with facet and highlight? if you tested them, please tell us.
          Thanks again.
          JohnWu

          Show
          JohnWu added a comment - Tom: Thanks! do you test the patch with facet and highlight? if you tested them, please tell us. Thanks again. JohnWu
          Hide
          tom liu added a comment - - edited

          i upload one patch, that based on current trunk version.
          my env:
          hadoop: 0.20.2
          zookeep: 3.3.3
          katta: 0.6.3

          Show
          tom liu added a comment - - edited i upload one patch, that based on current trunk version. my env: hadoop: 0.20.2 zookeep: 3.3.3 katta: 0.6.3
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Jason Rutherglen added a comment -

          I think John Wu brings up excellent points. I don't think Solr Cloud
          offers the same thing as this issue, and/or it's not articulated well on
          the wiki. Lucene out of the box doesn't offer facets and other search
          component features. These are things Solr provides but could/should be
          modularized out as already proposed. Solr is currently too tightly
          interwoven, this is perhaps why this patch is challenging to operate.
          Integrating alternative systems into Solr seems to be political from my
          point of view, eg, <political>Solr + Katta</political>

          Show
          Jason Rutherglen added a comment - I think John Wu brings up excellent points. I don't think Solr Cloud offers the same thing as this issue, and/or it's not articulated well on the wiki. Lucene out of the box doesn't offer facets and other search component features. These are things Solr provides but could/should be modularized out as already proposed. Solr is currently too tightly interwoven, this is perhaps why this patch is challenging to operate. Integrating alternative systems into Solr seems to be political from my point of view, eg, <political>Solr + Katta</political>
          Hide
          JohnWu added a comment -

          Eric Pugh:
             TomLiu patch the code form the svn, but code is changed. so I think you need find the old code about 2010 octmeber, the system can work.

          the code is just add some class of katta to solr and correct the queryComponent of solr. But the lucene code is the biggest problem for us, it is a devleopment version.

          the whole config process, I have a comment for it. Please tell us your problem.
          JohnWu

          Show
          JohnWu added a comment - Eric Pugh:    TomLiu patch the code form the svn, but code is changed. so I think you need find the old code about 2010 octmeber, the system can work. the code is just add some class of katta to solr and correct the queryComponent of solr. But the lucene code is the biggest problem for us, it is a devleopment version. the whole config process, I have a comment for it. Please tell us your problem. JohnWu
          Hide
          JohnWu added a comment - - edited

          Stefan Groschupf:

          Thanks to your contribution in katta, but do you think the lucene can supply some functions like facet and multi-core in solr?

             We all know the patch use Hadoop rpc connect the slave nodes like katta your built.We want build a cloud protoype with Web APP server features. But we didn't need your advertisement in Katta.Is there zero value for this patch? cost in Network transmission?If you select distribute search, all the issues you can not avoid.

          so plase give us some valuable suggestions or codes, do not promote your katta, Despise you so advertisement in Katta.

          JohnWu

          Show
          JohnWu added a comment - - edited Stefan Groschupf: Thanks to your contribution in katta, but do you think the lucene can supply some functions like facet and multi-core in solr?    We all know the patch use Hadoop rpc connect the slave nodes like katta your built.We want build a cloud protoype with Web APP server features. But we didn't need your advertisement in Katta.Is there zero value for this patch? cost in Network transmission?If you select distribute search, all the issues you can not avoid. so plase give us some valuable suggestions or codes, do not promote your katta, Despise you so advertisement in Katta. JohnWu
          Hide
          JohnWu added a comment - - edited

          Stefan Groschupf,Eric Pugh,Pulkit Agrawal and other peoples:

          today, I upload a figure for 1395 patch, and describe why we use this patch. (solr1395.jpg 27/May/11 01:00 226 kB)

          the whole ARCH contains 3 levels:
          1) Solr level
          2) katta slave node level
          3) Hadoop level

          all the frameworks give web app server, distribute search, data store and MapReduce features to lucene.

          I will give you a detail document about how to use this patch, now, just say the process of a coarse-grained

          a) build the environment

          1)install Hadoop (ensure you can browse your files in HDFS)
          2)install Zookeeper (ensure you can zkCli.sh connect the server)
          3)install katta (ensure masternode and datanode can run, use "katta check" the shard and "start Master -ne" means you use unembeded style to satrt the katta master)

          b) patch the solr

          1) trunk the code form http://svn.apache.org/repos/asf/lucene/dev/trunk
          (tom liu added a comment - 20/Oct/10 06:19)

          2) add the patch, manual patch some code if reject
          (solr-1395-katta-0.6.2-3.patch 10/Nov/10 02:12 108 kB)

          3) correct the code of queryComponent(solr)

          //JohnWu correct the && to ||, need decide the shards is null
          if (shards == null)

          { hasShardURL = false; }

          else

          { hasShardURL = shards != null || shards.indexOf('/') > 0; }

          c) query and config the system (katta-solrcores.jpg 03/Dec/10 03:44 94 kB)

          1) web container (tomcat) start the solr server as the figure showed the proxy, you need correct the solrconfig.xml

          <requestHandler name="standard" class="solr.KattaRequestHandler" default="true">
          <lst name="defaults">
          <str name="echoParams">explicit</str>
          <str name="shards">*</str>
          </lst>
          </requestHandler>

          the solr will use the kattaclient to dispatch the query to subproxy nodes (katta datanodes)

          2) katta datanode start with embeded solr

          correct the katta sh script as follows:
          KATTA_OPTS="$KATTA_OPTS -Dsolr.home=/var/data/solr -Dsolr.directoryFactory=solr.MMapDirectoryFactory"

          add the "zookeeper.servers=localhost:2181" and "zookeeper.embedded=false" in katta.zk.properties, put this file in your class path

          the proxy solr config, you need correct the solrconfig.xml as follows:
          <requestHandler name="standard" class="solr.SearchHandler" default="true">...</requestHandler>

          3) deploy your queryCore.zip(the folder hieracy, please look TomLiu comments) with "katta addIndex queryCore*.zip hdfs://*****"
          deployed queryCore has the solrconfig.xml as follows:
          <requestHandler name="standard" class="solr.MultiEmbeddedSearchHandler" default="true">...</requestHandler>

          4)use the follows query
          http://localhost:8080/solr-1395-katta-0.6.2-2patch/select/?q=apple&version=2.2&start=0&rows=10&indent=on&isShard=false&distrib=true

          the index you can use the example of solr1.4.

          ok, the hits return :

          <result name="response" numFound="1" start="0">
          \u2212
          <doc>
          <str name="id">MA147LL/A</str>
          <str name="name">Apple 60 GB iPod with Video Playback Black</str>
          <str name="manu">Apple Computer Inc.</str>
          \u2212

          d) some amazing things

          If one node crashed, the other nodes will still run.The system will redeploy a set of index to a new node, keep the system stable on fly and replicate number is a fixed number.

          Summary

          If you use the patch, you need read all the comments as the order of Date.
          If you want a flexible structure, please use this patch.
          If you want use the solr multicore, please use this patch.

          Thanks to TomLiu, Jason Rutherglen and Jason Venner.

          Thanks alot!

          JohnWu

          Show
          JohnWu added a comment - - edited Stefan Groschupf,Eric Pugh,Pulkit Agrawal and other peoples: today, I upload a figure for 1395 patch, and describe why we use this patch. (solr1395.jpg 27/May/11 01:00 226 kB) the whole ARCH contains 3 levels: 1) Solr level 2) katta slave node level 3) Hadoop level all the frameworks give web app server, distribute search, data store and MapReduce features to lucene. I will give you a detail document about how to use this patch, now, just say the process of a coarse-grained a) build the environment 1)install Hadoop (ensure you can browse your files in HDFS) 2)install Zookeeper (ensure you can zkCli.sh connect the server) 3)install katta (ensure masternode and datanode can run, use "katta check" the shard and "start Master -ne" means you use unembeded style to satrt the katta master) b) patch the solr 1) trunk the code form http://svn.apache.org/repos/asf/lucene/dev/trunk (tom liu added a comment - 20/Oct/10 06:19) 2) add the patch, manual patch some code if reject (solr-1395-katta-0.6.2-3.patch 10/Nov/10 02:12 108 kB) 3) correct the code of queryComponent(solr) //JohnWu correct the && to ||, need decide the shards is null if (shards == null) { hasShardURL = false; } else { hasShardURL = shards != null || shards.indexOf('/') > 0; } c) query and config the system (katta-solrcores.jpg 03/Dec/10 03:44 94 kB) 1) web container (tomcat) start the solr server as the figure showed the proxy, you need correct the solrconfig.xml <requestHandler name="standard" class="solr.KattaRequestHandler" default="true"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="shards">*</str> </lst> </requestHandler> the solr will use the kattaclient to dispatch the query to subproxy nodes (katta datanodes) 2) katta datanode start with embeded solr correct the katta sh script as follows: KATTA_OPTS="$KATTA_OPTS -Dsolr.home=/var/data/solr -Dsolr.directoryFactory=solr.MMapDirectoryFactory" add the "zookeeper.servers=localhost:2181" and "zookeeper.embedded=false" in katta.zk.properties, put this file in your class path the proxy solr config, you need correct the solrconfig.xml as follows: <requestHandler name="standard" class="solr.SearchHandler" default="true">...</requestHandler> 3) deploy your queryCore.zip(the folder hieracy, please look TomLiu comments) with "katta addIndex queryCore* .zip hdfs:// *****" deployed queryCore has the solrconfig.xml as follows: <requestHandler name="standard" class="solr.MultiEmbeddedSearchHandler" default="true">...</requestHandler> 4)use the follows query http://localhost:8080/solr-1395-katta-0.6.2-2patch/select/?q=apple&version=2.2&start=0&rows=10&indent=on&isShard=false&distrib=true the index you can use the example of solr1.4. ok, the hits return : <result name="response" numFound="1" start="0"> \u2212 <doc> <str name="id">MA147LL/A</str> <str name="name">Apple 60 GB iPod with Video Playback Black</str> <str name="manu">Apple Computer Inc.</str> \u2212 d) some amazing things If one node crashed, the other nodes will still run.The system will redeploy a set of index to a new node, keep the system stable on fly and replicate number is a fixed number. Summary If you use the patch, you need read all the comments as the order of Date. If you want a flexible structure, please use this patch. If you want use the solr multicore, please use this patch. Thanks to TomLiu, Jason Rutherglen and Jason Venner. Thanks alot! JohnWu
          Hide
          JohnWu added a comment -

          why we use this patch

          Show
          JohnWu added a comment - why we use this patch
          Hide
          Stefan Groschupf added a comment -

          Thanks Eric,
          I would argue you don't need Solr if you want to do a serious distributed indexing and searching platform the advantages of Solr's HTTP API and it's overhead are just in your way.
          But take that with salt, since I'm the guy that founded Katta.

          Show
          Stefan Groschupf added a comment - Thanks Eric, I would argue you don't need Solr if you want to do a serious distributed indexing and searching platform the advantages of Solr's HTTP API and it's overhead are just in your way. But take that with salt, since I'm the guy that founded Katta.
          Hide
          Eric Pugh added a comment -

          I wanted to somewhat second Stefan's comment. There are some advantages to using Katta, but what we found is that this patch is very NOT ready for use. If you want to integrate Solr into a Katta type world, you are going to be writing some serious code, so don't just budget a week to hook Katta into Solr!

          We ended up going down the Solr Cloud route on a recent project, and hooking in some distributed support for durable indexes. Inspired by this patch, but certainly not using this patch!

          Show
          Eric Pugh added a comment - I wanted to somewhat second Stefan's comment. There are some advantages to using Katta, but what we found is that this patch is very NOT ready for use. If you want to integrate Solr into a Katta type world, you are going to be writing some serious code, so don't just budget a week to hook Katta into Solr! We ended up going down the Solr Cloud route on a recent project, and hooking in some distributed support for durable indexes. Inspired by this patch, but certainly not using this patch!
          Hide
          Stefan Groschupf added a comment -

          Hey People,
          I would love to make sure everybody understands it is technically possible to embed Solr into Katta but it does not make sense.
          There is zero value add in this! In fact you just slow down your searches.

          Stefan

          Show
          Stefan Groschupf added a comment - Hey People, I would love to make sure everybody understands it is technically possible to embed Solr into Katta but it does not make sense. There is zero value add in this! In fact you just slow down your searches. Stefan
          Hide
          JohnWu added a comment -

          Pulkit Agrawal:

          start the katta with embeded solr, you config the class(DeployableSolrKattaServer) in katta config file is ok.

          now, I think your lucene version is not suitable for this solr after patched. Lucene 4.0 means the 1395 solr use lucene-4.0-snapshot. It is a develop version of lucene, The index of lucene 3.0 and solr1.4.0 is matched for it.

          I think you have a long term for it, you doen't build the code of lucene from the cvs of solr(as TomLiu said), so you the dispatch query to proxy, but the proxy can not parse it.
          Pay attention to the config and query core, the solr home and config file need same as each other.
          Thanks!

          Show
          JohnWu added a comment - Pulkit Agrawal: start the katta with embeded solr, you config the class(DeployableSolrKattaServer) in katta config file is ok. now, I think your lucene version is not suitable for this solr after patched. Lucene 4.0 means the 1395 solr use lucene-4.0-snapshot. It is a develop version of lucene, The index of lucene 3.0 and solr1.4.0 is matched for it. I think you have a long term for it, you doen't build the code of lucene from the cvs of solr(as TomLiu said), so you the dispatch query to proxy, but the proxy can not parse it. Pay attention to the config and query core, the solr home and config file need same as each other. Thanks!
          Hide
          Pulkit Agrawal added a comment -

          I am able to run the proxy but still stuck with

          subproxy:
          katta startNode -c org.apache.solr.katta.DeployableSolrKattaServer -s

          I got folloeing error

          NFO: New CoreContainer 26210109
          May 25, 2011 8:10:28 PM org.apache.solr.core.SolrResourceLoader <init>
          INFO: Solr home set to '/home/ec2-user/pulkit/example/solr/'
          May 25, 2011 8:10:28 PM org.apache.solr.core.SolrResourceLoader <init>
          INFO: Solr home set to '/home/ec2-user/pulkit/example/solr/./'
          May 25, 2011 8:10:28 PM org.apache.solr.core.SolrConfig initLibs
          INFO: Adding specified lib dirs to ClassLoader
          May 25, 2011 8:10:28 PM org.apache.solr.common.SolrException log
          SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_CURRENT] or a string in format 'V.V'
          at org.apache.solr.core.Config.parseLuceneVersionString(Config.java:306)
          at org.apache.solr.core.Config.getLuceneVersion(Config.java:286)
          at org.apache.solr.core.SolrConfig.<init>(SolrConfig.java:132)
          at org.apache.solr.core.CoreContainer.create(CoreContainer.java:578)
          at org.apache.solr.core.CoreContainer.load(CoreContainer.java:406)
          at org.apache.solr.core.CoreContainer.load(CoreContainer.java:291)
          at org.apache.solr.core.CoreContainer.<init>(CoreContainer.java:109)
          at org.apache.solr.katta.DeployableSolrKattaServer.<init>(DeployableSolrKattaServer.java:62)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
          at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
          at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
          at java.lang.Class.newInstance0(Class.java:355)
          at java.lang.Class.newInstance(Class.java:308)
          at net.sf.katta.util.ClassUtil.newInstance(ClassUtil.java:51)
          at net.sf.katta.Katta$3.parseArguments(Katta.java:303)
          at net.sf.katta.Katta$Command.parseArguments(Katta.java:958)
          at net.sf.katta.Katta.main(Katta.java:95)
          Caused by: java.lang.IllegalArgumentException: No enum const class org.apache.lucene.util.Version.LUCENE_40
          at java.lang.Enum.valueOf(Enum.java:196)
          at org.apache.lucene.util.Version.valueOf(Version.java:30)
          at org.apache.solr.core.Config.parseLuceneVersionString(Config.java:304)
          ... 17 more

          ERROR: could not instantiate class org.apache.solr.katta.DeployableSolrKattaServer
          java.lang.RuntimeException: could not instantiate class org.apache.solr.katta.DeployableSolrKattaServer
          at net.sf.katta.util.ClassUtil.newInstance(ClassUtil.java:53)
          at net.sf.katta.Katta$3.parseArguments(Katta.java:303)
          at net.sf.katta.Katta$Command.parseArguments(Katta.java:958)
          at net.sf.katta.Katta.main(Katta.java:95)
          Caused by: org.apache.solr.common.SolrException: defaultCore:proxy could not be found
          at org.apache.solr.katta.SolrKattaServer.<init>(SolrKattaServer.java:54)
          at org.apache.solr.katta.DeployableSolrKattaServer.<init>(DeployableSolrKattaServer.java:62)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
          at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
          at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
          at java.lang.Class.newInstance0(Class.java:355)
          at java.lang.Class.newInstance(Class.java:308)
          at net.sf.katta.util.ClassUtil.newInstance(ClassUtil.java:51)
          ... 3 more
          Usage:
          startNode [-c <serverClass>] [-p <port number>] Starts a local node

          Invalid luceneMatchVersion 'LUCENE_40' ----------- What does that mean?

          Any pointer?
          Thanks

          Show
          Pulkit Agrawal added a comment - I am able to run the proxy but still stuck with subproxy: katta startNode -c org.apache.solr.katta.DeployableSolrKattaServer -s I got folloeing error NFO: New CoreContainer 26210109 May 25, 2011 8:10:28 PM org.apache.solr.core.SolrResourceLoader <init> INFO: Solr home set to '/home/ec2-user/pulkit/example/solr/' May 25, 2011 8:10:28 PM org.apache.solr.core.SolrResourceLoader <init> INFO: Solr home set to '/home/ec2-user/pulkit/example/solr/./' May 25, 2011 8:10:28 PM org.apache.solr.core.SolrConfig initLibs INFO: Adding specified lib dirs to ClassLoader May 25, 2011 8:10:28 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_CURRENT] or a string in format 'V.V' at org.apache.solr.core.Config.parseLuceneVersionString(Config.java:306) at org.apache.solr.core.Config.getLuceneVersion(Config.java:286) at org.apache.solr.core.SolrConfig.<init>(SolrConfig.java:132) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:578) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:406) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:291) at org.apache.solr.core.CoreContainer.<init>(CoreContainer.java:109) at org.apache.solr.katta.DeployableSolrKattaServer.<init>(DeployableSolrKattaServer.java:62) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at java.lang.Class.newInstance0(Class.java:355) at java.lang.Class.newInstance(Class.java:308) at net.sf.katta.util.ClassUtil.newInstance(ClassUtil.java:51) at net.sf.katta.Katta$3.parseArguments(Katta.java:303) at net.sf.katta.Katta$Command.parseArguments(Katta.java:958) at net.sf.katta.Katta.main(Katta.java:95) Caused by: java.lang.IllegalArgumentException: No enum const class org.apache.lucene.util.Version.LUCENE_40 at java.lang.Enum.valueOf(Enum.java:196) at org.apache.lucene.util.Version.valueOf(Version.java:30) at org.apache.solr.core.Config.parseLuceneVersionString(Config.java:304) ... 17 more ERROR: could not instantiate class org.apache.solr.katta.DeployableSolrKattaServer java.lang.RuntimeException: could not instantiate class org.apache.solr.katta.DeployableSolrKattaServer at net.sf.katta.util.ClassUtil.newInstance(ClassUtil.java:53) at net.sf.katta.Katta$3.parseArguments(Katta.java:303) at net.sf.katta.Katta$Command.parseArguments(Katta.java:958) at net.sf.katta.Katta.main(Katta.java:95) Caused by: org.apache.solr.common.SolrException: defaultCore:proxy could not be found at org.apache.solr.katta.SolrKattaServer.<init>(SolrKattaServer.java:54) at org.apache.solr.katta.DeployableSolrKattaServer.<init>(DeployableSolrKattaServer.java:62) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at java.lang.Class.newInstance0(Class.java:355) at java.lang.Class.newInstance(Class.java:308) at net.sf.katta.util.ClassUtil.newInstance(ClassUtil.java:51) ... 3 more Usage: startNode [-c <serverClass>] [-p <port number>] Starts a local node Invalid luceneMatchVersion 'LUCENE_40' ----------- What does that mean? Any pointer? Thanks
          Hide
          Pulkit Agrawal added a comment -

          I am pretty close to get it done.
          Just need little help.

          I am setup whole thing on localhost.
          can you please guide me where should i put my configuration file.

          I have two separate directories for solr and katta.
          Where is proxy and where to putvariousconfihguration file?

          Please help

          Thanks in Advance

          Show
          Pulkit Agrawal added a comment - I am pretty close to get it done. Just need little help. I am setup whole thing on localhost. can you please guide me where should i put my configuration file. I have two separate directories for solr and katta. Where is proxy and where to putvariousconfihguration file? Please help Thanks in Advance
          Hide
          Pulkit Agrawal added a comment -

          Hi All,

          I am new to solr and katta.
          I really want to integrate both. But didn't find a guided path.
          Can you guys please help me how I can integrate the both

          Thanks in advancd.

          Show
          Pulkit Agrawal added a comment - Hi All, I am new to solr and katta. I really want to integrate both. But didn't find a guided path. Can you guys please help me how I can integrate the both Thanks in advancd.
          Hide
          Jamie Johnson added a comment - - edited

          I think I have most of this running, but I still have a disconnect. I've done the following:
          1. Patched
          2. Compiled
          3. Run web application with additional request handler added to solrconfig.mxl
          4. Started katta
          5. Started DeployableSolrKattaServer

          Now if I execute a query (http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&distrib=true) I get net.sf.katta.util.KattaException: No shards for indices: [*], which makes perfect sense since I have no indices deployed. As a simple test I deployed an index that comes stock with katta (bin/katta addIndex testIndex src/test/testIndexA 2), and execute my query again and I get no results (which also makes sense since that index does not match my solr config).

          All of that being said what is the process for publishing a core to katta? Is there a way to use the standard http methods to add to the index (using something like java -jar post.jar *.xml)? If not how is it done? Any insight into this would be greatly appreciated.

          Show
          Jamie Johnson added a comment - - edited I think I have most of this running, but I still have a disconnect. I've done the following: 1. Patched 2. Compiled 3. Run web application with additional request handler added to solrconfig.mxl 4. Started katta 5. Started DeployableSolrKattaServer Now if I execute a query ( http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&distrib=true ) I get net.sf.katta.util.KattaException: No shards for indices: [*] , which makes perfect sense since I have no indices deployed. As a simple test I deployed an index that comes stock with katta (bin/katta addIndex testIndex src/test/testIndexA 2), and execute my query again and I get no results (which also makes sense since that index does not match my solr config). All of that being said what is the process for publishing a core to katta? Is there a way to use the standard http methods to add to the index (using something like java -jar post.jar *.xml)? If not how is it done? Any insight into this would be greatly appreciated.
          Hide
          JohnWu added a comment -

          ok,ALL

          we have the correct result back (form slave02 to master):
          <result name="response" numFound="1" start="0">
          \u2212
          <doc>
          <str name="id">MA147LL/A</str>
          <str name="name">Apple 60 GB iPod with Video Playback Black</str>
          <str name="manu">Apple Computer Inc.</str>
          \u2212
          ...

          note:
          if you use the Tomliu's patch please correct the code of queryComponent:

          //JohnWu correct the && to ||, need decide the shards is null
          if (shards == null)

          { hasShardURL = false; }

          else

          { hasShardURL = shards != null || shards.indexOf('/') > 0; }

          so the queryCore can enter the distribute process and get the hits, the DocSlice cast to DocumentList

          If you have any problem, please ask me, we discuss it together

          Thanks!

          johnWu

          Show
          JohnWu added a comment - ok,ALL we have the correct result back (form slave02 to master): <result name="response" numFound="1" start="0"> \u2212 <doc> <str name="id">MA147LL/A</str> <str name="name">Apple 60 GB iPod with Video Playback Black</str> <str name="manu">Apple Computer Inc.</str> \u2212 ... note: if you use the Tomliu's patch please correct the code of queryComponent: //JohnWu correct the && to ||, need decide the shards is null if (shards == null) { hasShardURL = false; } else { hasShardURL = shards != null || shards.indexOf('/') > 0; } so the queryCore can enter the distribute process and get the hits, the DocSlice cast to DocumentList If you have any problem, please ask me, we discuss it together Thanks! johnWu
          Hide
          JohnWu added a comment -

          Tomliu:
          DeployableSolrKattaServer start a embeddedSOlrServer, we think the query need to shard, but the core descriptor shows the core is subproxy, so how we set the config of subproxy to find the seo0 shard? if distribute=false, the shard query score process can not trigger.

          Show
          JohnWu added a comment - Tomliu: DeployableSolrKattaServer start a embeddedSOlrServer, we think the query need to shard, but the core descriptor shows the core is subproxy, so how we set the config of subproxy to find the seo0 shard? if distribute=false, the shard query score process can not trigger.
          Hide
          JohnWu added a comment -

          Tomliu:
          yeah, I trace the process,
          1) proxy: shard=*;
          2) subproxy receive the request, param: shards=seo0;

          but I don't know how to set the subproxy to dispatch the query to seo0, you know the solr is not start as a service in Tomcat(subproxy is start by katta), so slorconfig.xml of subproxy can not set the shard as the http://localhost:8080/seo0.

          we trace the code, find out the query in proxy param: isdistribute=true, can call the distributeprocess, but the query in subproxy, the param: isdistribute=false,

          so can you tell me the how to set the shard info in subproxy?
          JohnWu

          Show
          JohnWu added a comment - Tomliu: yeah, I trace the process, 1) proxy: shard=*; 2) subproxy receive the request, param: shards=seo0; but I don't know how to set the subproxy to dispatch the query to seo0, you know the solr is not start as a service in Tomcat(subproxy is start by katta), so slorconfig.xml of subproxy can not set the shard as the http://localhost:8080/seo0 . we trace the code, find out the query in proxy param: isdistribute=true, can call the distributeprocess, but the query in subproxy, the param: isdistribute=false, so can you tell me the how to set the shard info in subproxy? JohnWu
          Hide
          tom liu added a comment -

          you can debug or trace the process:

          1. webapp's param: shards=*
          2. kattaclient's process: shards=seo0,seo1,...
          3. sub-proxy's param: shards=seo0 [maybe many requestes,so param is not same]
          4. sub-proxy then dispatch request to seo0 queryCore

          other process, iff you put shards=seo0::

          1. webapp's param: shards=seo0
          2. kattaclient's process: shards=seo0
          3. sub-proxy's param: shards=seo0 [one request]
          4. sub-proxy then dispatch request to seo0 queryCore
          Show
          tom liu added a comment - you can debug or trace the process: webapp's param: shards=* kattaclient's process: shards=seo0,seo1,... sub-proxy's param: shards=seo0 [maybe many requestes,so param is not same] sub-proxy then dispatch request to seo0 queryCore other process, iff you put shards=seo0:: webapp's param: shards=seo0 kattaclient's process: shards=seo0 sub-proxy's param: shards=seo0 [one request] sub-proxy then dispatch request to seo0 queryCore
          Hide
          JohnWu added a comment -

          TomLiu:
          just one question, how the subproxy dispatch the query to queryCore?
          JohnWu

          Show
          JohnWu added a comment - TomLiu: just one question, how the subproxy dispatch the query to queryCore? JohnWu
          Hide
          JohnWu added a comment -

          TomLiu:
          ok, as you said, I use the katta addindex the querycore to katta(in node properties).
          Now I ask you another question about:
          how the sub-proxy dispatch the query to queryCore? (set shard in solrconfig.xml?)
          we know the query in sub-proxy param is isShard=true, (only the query can select the queryCore to seo0#seo0)?
          JohnWu

          Show
          JohnWu added a comment - TomLiu: ok, as you said, I use the katta addindex the querycore to katta(in node properties). Now I ask you another question about: how the sub-proxy dispatch the query to queryCore? (set shard in solrconfig.xml?) we know the query in sub-proxy param is isShard=true, (only the query can select the queryCore to seo0#seo0)? JohnWu
          Hide
          tom liu added a comment -

          please see conf file katta.node.properties.
          the node.shard.folder property defined the folder of queryCore

          Show
          tom liu added a comment - please see conf file katta.node.properties. the node.shard.folder property defined the folder of queryCore
          Hide
          JohnWu added a comment -

          Tomliu:
          how to katta addIndex hdfs://***.zip to queryCore /data folder?
          you know the katta addindex target folder is a tmp folder?
          Do you set the folder with the -Dsolr.directoryFactory=solr.MMapDirectoryFactory?
          how to set the config for MMapDirectoryFactory?
          Thanks! JohnWu

          Show
          JohnWu added a comment - Tomliu: how to katta addIndex hdfs://***.zip to queryCore /data folder? you know the katta addindex target folder is a tmp folder? Do you set the folder with the -Dsolr.directoryFactory=solr.MMapDirectoryFactory? how to set the config for MMapDirectoryFactory? Thanks! JohnWu
          Hide
          tom liu added a comment -

          i just zip /var/data/hdfsfile to seo.zip:
          zip -r /var/data/hdfsfile seo

          Show
          tom liu added a comment - i just zip /var/data/hdfsfile to seo.zip: zip -r /var/data/hdfsfile seo
          Hide
          JohnWu added a comment -

          TomLiu:
          can you tell me the seo.zip contains folder hierarchy?
          I use katta addindex only deploy the index in katta cluster, so the core is pre load in each node, and the MultiEmbeddedSearchHandler return the DocSlice,
          so tell me why or I try to zip a patched solr with conf as you said, please tell me your seo.zip contents, Thanks!
          JohnWu

          Sanjay:
          patch -p0 -i solr-1395-katta-0.6.3.patch is right.
          Maybe you trunk code is the newest? Tom can reply this issue.
          JohnWu

          Show
          JohnWu added a comment - TomLiu: can you tell me the seo.zip contains folder hierarchy? I use katta addindex only deploy the index in katta cluster, so the core is pre load in each node, and the MultiEmbeddedSearchHandler return the DocSlice, so tell me why or I try to zip a patched solr with conf as you said, please tell me your seo.zip contents, Thanks! JohnWu Sanjay: patch -p0 -i solr-1395-katta-0.6.3.patch is right. Maybe you trunk code is the newest? Tom can reply this issue. JohnWu
          Hide
          tom liu added a comment -

          use the katta addindex seo.zip to deploy the querycore to katta slave node.

          seo.zip is patched solr.

          Show
          tom liu added a comment - use the katta addindex seo.zip to deploy the querycore to katta slave node. seo.zip is patched solr.
          Hide
          Sanjay added a comment -

          Hi John,

          I'am facing problem patching solr trunk with solr-1395-katta-0.6.2-3.patch. I'am sure missing something, should I patch katta trunk with same patch first?
          The install instructions above are old and looks like they no longer apply.

          Here are the steps I followed:

          1. svn co http://svn.apache.org/repos/asf/lucene/dev/trunk lucene
          2. cd lucene/solr/src/java
          3. wget http://xyz.com/solr-1395-katta-0.6.2-3.patch
          4. patch -p0 -i solr-1395-katta-0.6.2-3.patch

          patching file java/org/apache/solr/katta/SolrIndexer.java
          patching file java/org/apache/solr/handler/KattaRequestHandler.java
          can't find file to patch at input line 1663
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------

          Property changes on: java/org/apache/solr/handler/KattaRequestHandler.java
          ___________________________________________________________________
          Added: svn:mime-type
          + text/plain
          Index: java/org/apache/solr/handler/component/SearchHandler.java
          ===================================================================
          — java/org/apache/solr/handler/component/SearchHandler.java (revision 1003107)
          +++ java/org/apache/solr/handler/component/SearchHandler.java (working copy)

          Thanks!,
          Sanjay

          Show
          Sanjay added a comment - Hi John, I'am facing problem patching solr trunk with solr-1395-katta-0.6.2-3.patch. I'am sure missing something, should I patch katta trunk with same patch first? The install instructions above are old and looks like they no longer apply. Here are the steps I followed: 1. svn co http://svn.apache.org/repos/asf/lucene/dev/trunk lucene 2. cd lucene/solr/src/java 3. wget http://xyz.com/solr-1395-katta-0.6.2-3.patch 4. patch -p0 -i solr-1395-katta-0.6.2-3.patch patching file java/org/apache/solr/katta/SolrIndexer.java patching file java/org/apache/solr/handler/KattaRequestHandler.java can't find file to patch at input line 1663 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- Property changes on: java/org/apache/solr/handler/KattaRequestHandler.java ___________________________________________________________________ Added: svn:mime-type + text/plain Index: java/org/apache/solr/handler/component/SearchHandler.java =================================================================== — java/org/apache/solr/handler/component/SearchHandler.java (revision 1003107) +++ java/org/apache/solr/handler/component/SearchHandler.java (working copy) Thanks!, Sanjay
          Hide
          JohnWu added a comment -

          TomLiu:
          you means use the katta addindex seo.zip to deploy the querycore to katta slave node?
          seo.zip is patched solr? or use the normal solr?
          JohnWu

          Show
          JohnWu added a comment - TomLiu: you means use the katta addindex seo.zip to deploy the querycore to katta slave node? seo.zip is patched solr? or use the normal solr? JohnWu
          Hide
          tom liu added a comment -

          On Katta slave node, my folder hierarchy is:

          /var/data root
          /var/data/hadoop store hadoop data
          /var/data/hdfszips store zip tmp data, which get from hdfs,then move to katta's shardes
          /var/data/solr root store solr core configures
          /var/data/solr/seoproxy store seoproxy's solr config,which is used by sub-proxy
          /var/data/katta/shards/nodename_20000/seo0#seo0 store seo0 shard,which is deployed from master node
          /var/data/zkdata store zkserver data,which is zk logs and snapshotes

          On Katta master node, my folder hierarchy is:

          /var/data root
          /var/data/hadoop store hadoop data
          /var/data/hdfsfile store solr tmp data, which get from solr dataimporter,then zip && put to hdfs
          /var/data/solr root store solr core configures
          /var/data/solr/seo store seo's solr config,which is used by tomcat's webapp
          /var/data/zkdata store zkserver data,which is zk logs and snapshotes

          so, my config is from five folderes:

          Master /var/data/solr/seo tomcat webapp's solrcore config
          Slave /var/data/solr/seoproxy sub-proxy's solrcore config
          Master /var/data/hdfsfile query-core's config,which is config template
          HDFS http://hdfsname:9000/seo/seo0.zip query-core seo0's zip file,which is hold conf
          Slave /var/data/katta/shards/nodename_20000/seo0#seo0/conf query-core seo0's config,which is unzipped from seo0.zip of HDFS

          and, /var/data/hdfsfile structure is:

          seo@seo-solr1:/var/data/hdfsfile$ ll
          total 28
          drwxr-xr-x 6 seo seo 4096 Oct 21 15:21 ./
          drwxr-xr-x 4 seo seo 4096 Feb 16 15:49 ../
          drwxr-xr-x 2 seo seo 4096 Oct  8 09:17 bin/
          drwxr-xr-x 4 seo seo 4096 Jan 21 18:22 conf/
          drwxr-xr-x 3 seo seo 4096 Oct 21 15:21 data/
          drwxr-xr-x 2 seo seo 4096 Sep 29 14:01 lib/
          -rw-r--r-- 1 seo seo 1320 Oct  8 09:20 solr.xml
          
          Show
          tom liu added a comment - On Katta slave node, my folder hierarchy is: /var/data root /var/data/hadoop store hadoop data /var/data/hdfszips store zip tmp data, which get from hdfs,then move to katta's shardes /var/data/solr root store solr core configures /var/data/solr/seoproxy store seoproxy's solr config,which is used by sub-proxy /var/data/katta/shards/nodename_20000/seo0#seo0 store seo0 shard,which is deployed from master node /var/data/zkdata store zkserver data,which is zk logs and snapshotes On Katta master node, my folder hierarchy is: /var/data root /var/data/hadoop store hadoop data /var/data/hdfsfile store solr tmp data, which get from solr dataimporter,then zip && put to hdfs /var/data/solr root store solr core configures /var/data/solr/seo store seo's solr config,which is used by tomcat's webapp /var/data/zkdata store zkserver data,which is zk logs and snapshotes so, my config is from five folderes: Master /var/data/solr/seo tomcat webapp's solrcore config Slave /var/data/solr/seoproxy sub-proxy's solrcore config Master /var/data/hdfsfile query-core's config,which is config template HDFS http://hdfsname:9000/seo/seo0.zip query-core seo0's zip file,which is hold conf Slave /var/data/katta/shards/nodename_20000/seo0#seo0/conf query-core seo0's config,which is unzipped from seo0.zip of HDFS and, /var/data/hdfsfile structure is: seo@seo-solr1:/var/data/hdfsfile$ ll total 28 drwxr-xr-x 6 seo seo 4096 Oct 21 15:21 ./ drwxr-xr-x 4 seo seo 4096 Feb 16 15:49 ../ drwxr-xr-x 2 seo seo 4096 Oct 8 09:17 bin/ drwxr-xr-x 4 seo seo 4096 Jan 21 18:22 conf/ drwxr-xr-x 3 seo seo 4096 Oct 21 15:21 data/ drwxr-xr-x 2 seo seo 4096 Sep 29 14:01 lib/ -rw-r--r-- 1 seo seo 1320 Oct 8 09:20 solr.xml
          Hide
          JohnWu added a comment -

          tomliu:
          so the solrhome of ISolrServer need configure to multi-core style?
          in solr.xml
          <solr persistent="false">
          <cores adminPath="/admin/cores">

          <core name="queryCore" instanceDir="queryCore"/>

          </cores>
          </solr>
          but how to set the handler to each role of katta slave?

          can you show the solr home folder hierarchy and config content of katta slave node?

          Show
          JohnWu added a comment - tomliu: so the solrhome of ISolrServer need configure to multi-core style? in solr.xml <solr persistent="false"> <cores adminPath="/admin/cores"> <core name="queryCore" instanceDir="queryCore"/> </cores> </solr> but how to set the handler to each role of katta slave? can you show the solr home folder hierarchy and config content of katta slave node?
          Hide
          tom liu added a comment -

          ISolrServer's config is set by katta script. QueryCore's config will be set autolly.

          Sub-proxy solr is just proxy, which do not process any request.
          so, sub-proxy dispatch request to querycore. and querycore process request, return solrdoclists.

          but, you get the exception which do not cast object type. i think that querycore would be wrong.

          Show
          tom liu added a comment - ISolrServer's config is set by katta script. QueryCore's config will be set autolly. Sub-proxy solr is just proxy, which do not process any request. so, sub-proxy dispatch request to querycore. and querycore process request, return solrdoclists. but, you get the exception which do not cast object type. i think that querycore would be wrong.
          Hide
          JohnWu added a comment -

          Tomliu:
          Do you mean ISolrServer use katta script to direct the query core home and indexDirectory?

          now I use a katta with solr patched jar to start the subproxy, the solr home is set in katta script:

          -Dsolr.home=/home/hadoop/workspace/kattaNoZK/solrHome

          in conf folder of solrHome the solrconfig.xml is:

          <requestHandler name="standard" class="solr.SearchHandler" default="true">
          <!-- default values for query parameters -->
          <lst name="defaults">
          <str name="echoParams">explicit</str>
          <!--<str name="shards.qt">tvrh</str> -->
          </lst>
          </requestHandler>

          so the solr send the query to this core of solrHome with searchHanler (which use query component and return the Doclice)

          As you yesterday said, I need correct the solrHome of katta script (slave node) direct it to query core? but how to configure the sub-proxy solr home with solr.searchHandler?

          Thanks!

          JohnWu

          Show
          JohnWu added a comment - Tomliu: Do you mean ISolrServer use katta script to direct the query core home and indexDirectory? now I use a katta with solr patched jar to start the subproxy, the solr home is set in katta script: -Dsolr.home=/home/hadoop/workspace/kattaNoZK/solrHome in conf folder of solrHome the solrconfig.xml is: <requestHandler name="standard" class="solr.SearchHandler" default="true"> <!-- default values for query parameters --> <lst name="defaults"> <str name="echoParams">explicit</str> <!--<str name="shards.qt">tvrh</str> --> </lst> </requestHandler> so the solr send the query to this core of solrHome with searchHanler (which use query component and return the Doclice) As you yesterday said, I need correct the solrHome of katta script (slave node) direct it to query core? but how to configure the sub-proxy solr home with solr.searchHandler? Thanks! JohnWu
          Hide
          tom liu added a comment -

          the ISolrServer is handled by Katta-Node, configures be:

          1. solrconfig.xml: which is used by ISolrServer's Default SolrCore
          2. katta script: which is used to tell iSolrServer's SolrHome.

          Katta's Script:[On katta node, but not on katta master]

          KATTA_OPTS="$KATTA_OPTS -Dsolr.home=/var/data/solr -Dsolr.directoryFactory=solr.MMapDirectoryFactory"
          

          Katta startup node, the IsolrServer will be got solr.home and solr.directoryFactory
          and then, ISolrServer's Default SolrCore will use those env to hold solrcore.

          Show
          tom liu added a comment - the ISolrServer is handled by Katta-Node, configures be: solrconfig.xml: which is used by ISolrServer's Default SolrCore katta script: which is used to tell iSolrServer's SolrHome. Katta's Script: [On katta node, but not on katta master] KATTA_OPTS="$KATTA_OPTS -Dsolr.home=/var/data/solr -Dsolr.directoryFactory=solr.MMapDirectoryFactory" Katta startup node, the IsolrServer will be got solr.home and solr.directoryFactory and then, ISolrServer's Default SolrCore will use those env to hold solrcore.
          Hide
          JohnWu added a comment -

          TomLiu:
          how to configure the ISolrServer, let it dispatch request to SolrCore? only in solrconfig.xml(searchHandler) and properties(DeployableSOlrKattaServer)?

          JohnWu

          Show
          JohnWu added a comment - TomLiu: how to configure the ISolrServer, let it dispatch request to SolrCore? only in solrconfig.xml(searchHandler) and properties(DeployableSOlrKattaServer)? JohnWu
          Hide
          tom liu added a comment -

          JohnWu:
          the request and response are transfered between those commponents:

          1. one tomcat webapp, which is the frontserver before Katta-Integrated-Solr
          2. kattaclient, which runs on tomcat webapp, and dispatch requestes to KattaCluster with RPC
          3. KattaCluster Node, which handles contentserver(ISolrServer), and RPC server
          4. ISolrServer, which recv request, and dispatch request to SolrCore
          5. SolrCore, which is EmbeddedSolrServer, and returns response from request

          so, queryComponent return the DocSlice, but with EmbeddedSolrServer, the DocSlice is casted to SolrDocumentList.

          i attached one JPG file, which shows some commponents deployed.

          with katta, there are two deployments:

          1. app deployment: deploy commponents, and startup
          2. data deployment: use katta script to deploy data, for example, addIndex/removeIndex/redeployIndex
          Show
          tom liu added a comment - JohnWu: the request and response are transfered between those commponents: one tomcat webapp, which is the frontserver before Katta-Integrated-Solr kattaclient, which runs on tomcat webapp, and dispatch requestes to KattaCluster with RPC KattaCluster Node, which handles contentserver(ISolrServer), and RPC server ISolrServer, which recv request, and dispatch request to SolrCore SolrCore, which is EmbeddedSolrServer, and returns response from request so, queryComponent return the DocSlice, but with EmbeddedSolrServer, the DocSlice is casted to SolrDocumentList. i attached one JPG file, which shows some commponents deployed. with katta, there are two deployments: app deployment: deploy commponents, and startup data deployment: use katta script to deploy data, for example, addIndex/removeIndex/redeployIndex
          Hide
          JohnWu added a comment -

          TomLiu:

          as you said:QueryComponent returns DocSlice, but XMLWrite or EmbeddedServer returns SolrDocumentList from DocList.

          I set the requestHandler to solr.MultiEmbeddedSearchHandler but the queryComponent still return the DocSlice.

          Can you give me some advices?

          Thanks!

          JohnWu

          Show
          JohnWu added a comment - TomLiu: as you said:QueryComponent returns DocSlice, but XMLWrite or EmbeddedServer returns SolrDocumentList from DocList. I set the requestHandler to solr.MultiEmbeddedSearchHandler but the queryComponent still return the DocSlice. Can you give me some advices? Thanks! JohnWu
          Hide
          Jerry Mindek added a comment -

          Hello! I am very interested in integrating Katta into Solr.

          Unfortunately, I have been unsuccessful at compiling or build Katta into Solr.
          I have tried to integrate Katta into both branch-1.4 and the current Solr trunk.

          Could someone please post an up-to-date integration guide?

          Show
          Jerry Mindek added a comment - Hello! I am very interested in integrating Katta into Solr. Unfortunately, I have been unsuccessful at compiling or build Katta into Solr. I have tried to integrate Katta into both branch-1.4 and the current Solr trunk. Could someone please post an up-to-date integration guide?
          Hide
          JohnWu added a comment -

          do you mean if we not use the queryComponent , we can not throw the cast exception?
          so we need correct the solr.config with

          <queryResponseWriter name="xml" class="org.apache.solr.request.XMLResponseWriter" default="true"/>

          use the XMLResponseWriter to write the response?

          Show
          JohnWu added a comment - do you mean if we not use the queryComponent , we can not throw the cast exception? so we need correct the solr.config with <queryResponseWriter name="xml" class="org.apache.solr.request.XMLResponseWriter" default="true"/> use the XMLResponseWriter to write the response?
          Hide
          JohnWu added a comment -

          oh, TomLiu:

          I use the MultiEmbeddedSearchHandler in query core, but the request also can not through the code:

          // add end
          NamedList nl = resp.getValues();
          nl.add("QueriedShards", shards);

          // Added by tom liu
          SolrDocumentList sdl = (SolrDocumentList)nl.get("response");
          if( sdl == null )

          { nl.add("response", new SolrDocumentList()); if( log.isWarnEnabled() ) log.warn("SolrServer.SolrResponse: no response"); }

          if you use the katta start the embedded solr, the queryComponent is the necessary component for solr, so the above code will throw the exception of cast DocSlice to SolrDocumentList.
          Does I miss some step?

          what means but XMLWrite or EmbeddedServer returns SolrDocumentList from DocList.?

          Show
          JohnWu added a comment - oh, TomLiu: I use the MultiEmbeddedSearchHandler in query core, but the request also can not through the code: // add end NamedList nl = resp.getValues(); nl.add("QueriedShards", shards); // Added by tom liu SolrDocumentList sdl = (SolrDocumentList)nl.get("response"); if( sdl == null ) { nl.add("response", new SolrDocumentList()); if( log.isWarnEnabled() ) log.warn("SolrServer.SolrResponse: no response"); } if you use the katta start the embedded solr, the queryComponent is the necessary component for solr, so the above code will throw the exception of cast DocSlice to SolrDocumentList. Does I miss some step? what means but XMLWrite or EmbeddedServer returns SolrDocumentList from DocList.?
          Hide
          tom liu added a comment -

          sorry, the above comments have errors:
          in querycore(shards)/solrconfig.xml, requestHandler must be solr.MultiEmbeddedSearchHandler.

          querycore(shards)/solrconfig.xml
            <requestHandler name="standard" class="solr.MultiEmbeddedSearchHandler" default="true">
              <!-- default values for query parameters -->
               <lst name="defaults">
                 <str name="echoParams">explicit</str>
              </lst>
            </requestHandler>
          

          QueryComponent returns DocSlice, but XMLWrite or EmbeddedServer returns SolrDocumentList from DocList.

          Show
          tom liu added a comment - sorry, the above comments have errors: in querycore(shards)/solrconfig.xml, requestHandler must be solr.MultiEmbeddedSearchHandler. querycore(shards)/solrconfig.xml <requestHandler name= "standard" class= "solr.MultiEmbeddedSearchHandler" default= "true" > <!-- default values for query parameters --> <lst name= "defaults" > <str name= "echoParams" > explicit </str> </lst> </requestHandler> QueryComponent returns DocSlice, but XMLWrite or EmbeddedServer returns SolrDocumentList from DocList.
          Hide
          JohnWu added a comment -

          TomLiu:
          Now in query Core, the exception is org.apache.solr.search.DocSlice cannot be cast to org.apache.solr.common.SolrDocumentList

          the exception is throw by :

          // Added by tom liu
          SolrDocumentList sdl = (SolrDocumentList)nl.get("response");

          which is in public KattaResponse request(String[] shards, KattaRequest request) method of SolrKattaServer.java

          nl is a NamedList,response will back a DocSlice, DocSlice extends DocSetBase implements DocList, but class SolrDocumentList extends ArrayList<SolrDocument>, that's can not cast reason!!!

          what can I do?

          Show
          JohnWu added a comment - TomLiu: Now in query Core, the exception is org.apache.solr.search.DocSlice cannot be cast to org.apache.solr.common.SolrDocumentList the exception is throw by : // Added by tom liu SolrDocumentList sdl = (SolrDocumentList)nl.get("response"); which is in public KattaResponse request(String[] shards, KattaRequest request) method of SolrKattaServer.java nl is a NamedList,response will back a DocSlice, DocSlice extends DocSetBase implements DocList, but class SolrDocumentList extends ArrayList<SolrDocument>, that's can not cast reason!!! what can I do?
          Hide
          JohnWu added a comment -

          Tomliu:
          Maybe the last step for me, but it's so long!
          katta use the lucene version is 3.0, but the solr-1395 use lucene is 4.0 snapshot, I package the solr-1395 to jar and put it in the katta class path, but the lucene version is different,

          so if I use the katta search SPIndex02 content:lovealice 1

          slave return the lucene exception.

          how you make the lucene is same? add the keywordAnalyzer.class in lucene-40.-snapshot.jar?

          Thanks!

          JohnWu

          Show
          JohnWu added a comment - Tomliu: Maybe the last step for me, but it's so long! katta use the lucene version is 3.0, but the solr-1395 use lucene is 4.0 snapshot, I package the solr-1395 to jar and put it in the katta class path, but the lucene version is different, so if I use the katta search SPIndex02 content:lovealice 1 slave return the lucene exception. how you make the lucene is same? add the keywordAnalyzer.class in lucene-40.-snapshot.jar? Thanks! JohnWu
          Hide
          tom liu added a comment -

          In katta intergrated envs, solr is embeded.

          Katta does as distributed compute manager, which manages:

          1. node startup/shutdown
          2. shard deploy/undeploy
          3. rpc invoke to application/Solr

          and Solr does as application on distributed compute envs.

          in Master Box, QueryHandler must be solr.KattaSearchHandler in solrconfig.xml
          so that, kattaclient will be invoked by solrapp, and then invoked rpc to slave.

          in Slave Box, Katta will startup embeded solr, which is the subproxy.

          the shard, that is the query solrcore, will be deployed by katta's script:
          bin/katta addIndex <indexName> <indexPath>

          Show
          tom liu added a comment - In katta intergrated envs, solr is embeded. Katta does as distributed compute manager, which manages: node startup/shutdown shard deploy/undeploy rpc invoke to application/Solr and Solr does as application on distributed compute envs. in Master Box, QueryHandler must be solr.KattaSearchHandler in solrconfig.xml so that, kattaclient will be invoked by solrapp, and then invoked rpc to slave. in Slave Box, Katta will startup embeded solr, which is the subproxy. the shard, that is the query solrcore, will be deployed by katta's script: bin/katta addIndex <indexName> <indexPath>
          Hide
          JohnWu added a comment - - edited

          TomLiu:

          in katta's lib, there were so many jars, but some jars must be there. you know, Solr must include Luence's jar .

          I add some libs to katta,

          Do you mean the solr embeded in katta?

          Now the request can form the master to slave, but how the subproxy send the query to query core?

          I configure the subproxy (katta with solr 1395 patch):

          node.server.class=org.apache.solr.katta.DeployableSolrKattaServer

          and solr home is in katta sh, in solr home:

          solr.config is solr.SearchHandler

          but I do not know the katta can dispatch the query to query core, the solr.jar of katta will search the query in it's data directory, how about the shard confiure in subproxy?

          can you give me a detailed reply?

          Thanks alot!

          Show
          JohnWu added a comment - - edited TomLiu: in katta's lib, there were so many jars, but some jars must be there. you know, Solr must include Luence's jar . I add some libs to katta, Do you mean the solr embeded in katta? Now the request can form the master to slave, but how the subproxy send the query to query core? I configure the subproxy (katta with solr 1395 patch): node.server.class=org.apache.solr.katta.DeployableSolrKattaServer and solr home is in katta sh, in solr home: solr.config is solr.SearchHandler but I do not know the katta can dispatch the query to query core, the solr.jar of katta will search the query in it's data directory, how about the shard confiure in subproxy? can you give me a detailed reply? Thanks alot!
          Hide
          JohnWu added a comment -

          TomLiu:

          in slave node the katta.node.properties also set as follows?

          #node.server.class=net.sf.katta.lib.lucene.LuceneServer
          node.server.class=org.apache.solr.katta.DeployableSolrKattaServer

          Show
          JohnWu added a comment - TomLiu: in slave node the katta.node.properties also set as follows? #node.server.class=net.sf.katta.lib.lucene.LuceneServer node.server.class=org.apache.solr.katta.DeployableSolrKattaServer
          Hide
          tom liu added a comment -

          Eric:
          please put katta.zk.properties and katta.node.properties into your webapp's WEB-INF/classes

          Show
          tom liu added a comment - Eric: please put katta.zk.properties and katta.node.properties into your webapp's WEB-INF/classes
          Hide
          tom liu added a comment -

          Eric:
          please put katta.zk.properties and katta.node.properties into your webapp's WEB-INF/lib.

          JohnWu:
          in katta's lib, there were so many jars, but some jars must be there. you know, Solr must include Luence's jar .

          with your problem, that can't find pc-slavo2:20000, katta must connect to pc-slavo2:20000 through tcp socket.
          how about that you ping pc-slavo2 and telnet pc-slavo2 20000?
          you can try adding pc-slavo2 with ip address to hosts files.

          Show
          tom liu added a comment - Eric: please put katta.zk.properties and katta.node.properties into your webapp's WEB-INF/lib. JohnWu: in katta's lib, there were so many jars, but some jars must be there. you know, Solr must include Luence's jar . with your problem, that can't find pc-slavo2:20000, katta must connect to pc-slavo2:20000 through tcp socket. how about that you ping pc-slavo2 and telnet pc-slavo2 20000? you can try adding pc-slavo2 with ip address to hosts files.
          Hide
          JohnWu added a comment -

          Eric:
          if you use eclipse to debug the project
          katta.zk.properties in the src folder

          JohnWu

          Show
          JohnWu added a comment - Eric: if you use eclipse to debug the project katta.zk.properties in the src folder JohnWu
          Hide
          Eric Pugh added a comment -

          Tom, John,

          Just wanted to comment that having your conversation on this ticket in public has been great! I am a couple steps behind you, having started up Katta, and started Solr with the patch, but not having success on searching.

          My current error is that Solr can't find the katta.zk.properties file, where did you put it so it would be found on the class path?

          Eric

          Show
          Eric Pugh added a comment - Tom, John, Just wanted to comment that having your conversation on this ticket in public has been great! I am a couple steps behind you, having started up Katta, and started Solr with the patch, but not having success on searching. My current error is that Solr can't find the katta.zk.properties file, where did you put it so it would be found on the class path? Eric
          Hide
          JohnWu added a comment -

          Tomliu:
          so in proxy , not in sub-proxy, katta startNode need add the class org.apache.solr.katta.DeployableSolrKattaServer ?

          in katta's lib, there are too many differents: solr, lucene, zookeeper, the worst is lucene!

          can you give me a mailbox? I can contect you directly (mine is panglaohu@gmail.com).

          now, in workqueue of katta, NodeInteraction 135 row:
          T result = (T) _method.invoke(proxy, _args);

          proxy is a IPC of hadoop, it can not find the pc-slavo2:20000,
          I lose some config in hadoop? or I need patch Hadoop with your https://issues.apache.org/jira/browse/HADOOP-7017?

          please reply, thanks!

          JohnWu

          Show
          JohnWu added a comment - Tomliu: so in proxy , not in sub-proxy, katta startNode need add the class org.apache.solr.katta.DeployableSolrKattaServer ? in katta's lib, there are too many differents: solr, lucene, zookeeper, the worst is lucene! can you give me a mailbox? I can contect you directly (mine is panglaohu@gmail.com). now, in workqueue of katta, NodeInteraction 135 row: T result = (T) _method.invoke(proxy, _args); proxy is a IPC of hadoop, it can not find the pc-slavo2:20000, I lose some config in hadoop? or I need patch Hadoop with your https://issues.apache.org/jira/browse/HADOOP-7017? please reply, thanks! JohnWu
          Hide
          tom liu added a comment -

          in proxy:
          katta.node.properties:
          #node.server.class=net.sf.katta.lib.lucene.LuceneServer
          node.server.class=org.apache.solr.katta.DeployableSolrKattaServer

          you must put apache-solr-core-XXX.jar to katta's lib, and some relative jars.

          Show
          tom liu added a comment - in proxy: katta.node.properties: #node.server.class=net.sf.katta.lib.lucene.LuceneServer node.server.class=org.apache.solr.katta.DeployableSolrKattaServer you must put apache-solr-core-XXX.jar to katta's lib, and some relative jars.
          Hide
          JohnWu added a comment -

          TomLiu:

          I still jam in the query dipatch to subproxy!

          SEVERE: Error calling public abstract org.apache.solr.katta.KattaResponse org.apache.solr.katta.ISolrServer.request(java.lang.String[],org.apache.solr.katta.KattaRequest) throws java.lang.Exception on pc-slave02:20000 (try # 1 of 3) (id=0)
          java.lang.reflect.InvocationTargetException

          so, I give you my config in proxy, please review them:

          in proxy

          1)
          solrHome-> solrconfig.xml
          <config>

          <requestHandler name="standard" class="solr.KattaRequestHandler" default="true">

          <lst name="defaults">
          <str name="echoParams">explicit</str>
          <str name="shards">*</str>
          </lst>
          </requestHandler>
          </config>

          ok, all the shard is watched and hold in Zookeeper, through zookeeper zkCli.sh

          [zk: pc-master(CONNECTED) 11] ls /katta/shard-to-nodes
          SPIndex05#1287138886138-99384445, SPIndex04#1287138886138-99384445

          2)
          In proxy
          katta.node.properties:
          node.server.class=net.sf.katta.lib.lucene.LuceneServer

          3)
          query: http://localhost:8080/solr-1395-katta-0.6.2-2patch/select/?q=lovealice&version=2.2&start=0&rows=10&indent=on&isShard=false&distrib=true

          is that right?
          especial in this step 2,

          Thanks!

          JohnWu

          Show
          JohnWu added a comment - TomLiu: I still jam in the query dipatch to subproxy! SEVERE: Error calling public abstract org.apache.solr.katta.KattaResponse org.apache.solr.katta.ISolrServer.request(java.lang.String[],org.apache.solr.katta.KattaRequest) throws java.lang.Exception on pc-slave02:20000 (try # 1 of 3) (id=0) java.lang.reflect.InvocationTargetException so, I give you my config in proxy, please review them: in proxy 1) solrHome-> solrconfig.xml <config> <requestHandler name="standard" class="solr.KattaRequestHandler" default="true"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="shards">*</str> </lst> </requestHandler> </config> ok, all the shard is watched and hold in Zookeeper, through zookeeper zkCli.sh [zk: pc-master(CONNECTED) 11] ls /katta/shard-to-nodes SPIndex05#1287138886138-99384445, SPIndex04#1287138886138-99384445 2) In proxy katta.node.properties: node.server.class=net.sf.katta.lib.lucene.LuceneServer 3) query: http://localhost:8080/solr-1395-katta-0.6.2-2patch/select/?q=lovealice&version=2.2&start=0&rows=10&indent=on&isShard=false&distrib=true is that right? especial in this step 2, Thanks! JohnWu
          Hide
          tom liu added a comment -

          pls see the katta-solrcores picture.

          proxy and subproxy are different process, and sometimes are in different nodes.

          the process with query is :

          1. proxy that is solrapp on tomcat, receive query that you send from IE or Firefox
          2. then proxy dispatch query to subproxy
          3. subproxy is katta nodes that holds DeployableSolrKattaServer
          4. so, with spliting shards, subproxy redispatch query to shard
          5. every shard is a EmbeddedSolrServer, that holds one solrcore

          solrhome in proxy, be set from web.xml or java -DsolrHome=......
          but in subproxy, must be set in katta script
          in EmbeddedSolrServer, solrhome set automaticly.

          Show
          tom liu added a comment - pls see the katta-solrcores picture. proxy and subproxy are different process, and sometimes are in different nodes. the process with query is : proxy that is solrapp on tomcat, receive query that you send from IE or Firefox then proxy dispatch query to subproxy subproxy is katta nodes that holds DeployableSolrKattaServer so, with spliting shards, subproxy redispatch query to shard every shard is a EmbeddedSolrServer, that holds one solrcore solrhome in proxy, be set from web.xml or java -DsolrHome=...... but in subproxy, must be set in katta script in EmbeddedSolrServer, solrhome set automaticly.
          Hide
          JohnWu added a comment -

          TomLiu:
          Do you means that the patched solr with three cores?

          <solr persistent="false">

          <cores adminPath="/admin/cores">
          <core name="proxy" instanceDir="proxy"/>
          <core name="subproxy" instanceDir="subproxy"/>
          <core name="queryCore" instanceDir="queryCore"/>
          </cores>
          </solr>

          is that right?

          Show
          JohnWu added a comment - TomLiu: Do you means that the patched solr with three cores? <solr persistent="false"> <cores adminPath="/admin/cores"> <core name="proxy" instanceDir="proxy"/> <core name="subproxy" instanceDir="subproxy"/> <core name="queryCore" instanceDir="queryCore"/> </cores> </solr> is that right?
          Hide
          JohnWu added a comment -

          TomLiu:
          yeah, today I use the katta Git code (Maybe katta0.6.3) add the lib: solr-1395-katta-0.6.2-3patch.jar then
          bin/katta startNode org.apache.solr.katta.DeployableSolrKattaServer

          but back the error as follows:

          INFO: Solr home set to '/home/hadoop/workspace/kattaNoZK/solrHome/'
          ERROR: could not create instance of class 'org.apache.solr.katta.DeployableSolrKattaServer': defaultCore:proxy could not be found

          I ask an question:
          DeployableSolrKattaServer is for dispatch kattaRequest to subproxy, is right? which will use the in
          T result = (T) _method.invoke(proxy, _args); to reflect kattaRequests to nodes?

          in proxy, Myquery is :
          http://localhost:8080/solr-1395-katta-0.6.2-3patch/select?q=lovealice&version=2.2&start=0&rows=10&indent=on&isShard=false&distrib=true
          is that right?

          Thank you for your patience to answer me.

          today, I receive a email form aladeck, he let me introduce the whole process step by step, but I meet so many problem, I recommend you to answer this question for him.

          If you are so busy, please help gets some key advices to me, I would first send an report to you, your reviewed, then I send it to them.

          Thanks!

          JohnWu

          Show
          JohnWu added a comment - TomLiu: yeah, today I use the katta Git code (Maybe katta0.6.3) add the lib: solr-1395-katta-0.6.2-3patch.jar then bin/katta startNode org.apache.solr.katta.DeployableSolrKattaServer but back the error as follows: INFO: Solr home set to '/home/hadoop/workspace/kattaNoZK/solrHome/' ERROR: could not create instance of class 'org.apache.solr.katta.DeployableSolrKattaServer': defaultCore:proxy could not be found I ask an question: DeployableSolrKattaServer is for dispatch kattaRequest to subproxy, is right? which will use the in T result = (T) _method.invoke(proxy, _args); to reflect kattaRequests to nodes? in proxy, Myquery is : http://localhost:8080/solr-1395-katta-0.6.2-3patch/select?q=lovealice&version=2.2&start=0&rows=10&indent=on&isShard=false&distrib=true is that right? Thank you for your patience to answer me. today, I receive a email form aladeck, he let me introduce the whole process step by step, but I meet so many problem, I recommend you to answer this question for him. If you are so busy, please help gets some key advices to me, I would first send an report to you, your reviewed, then I send it to them. Thanks! JohnWu
          Hide
          tom liu added a comment -

          solrHome is set :

          1. webapps/yourapp/web.xml
            this is conf of frontserver or proxy.
          2. katta script
            this is conf of subproxy. such as:
            ...
            KATTA_OPTS="$KATTA_OPTS -Dsolr.home=/var/data/solr/kattaproxy"
            ...
            
          3. solrcore
            this conf would be set by Katta/Solr automatic.

          in katta integration, tomcat (or jetty) is the trigger point, which connected to zkserver and katta nodes.
          katta nodes that deployed solrcores are waiting for querys from tomcat.

          Show
          tom liu added a comment - solrHome is set : webapps/yourapp/web.xml this is conf of frontserver or proxy. katta script this is conf of subproxy. such as: ... KATTA_OPTS="$KATTA_OPTS -Dsolr.home=/var/data/solr/kattaproxy" ... solrcore this conf would be set by Katta/Solr automatic. in katta integration, tomcat (or jetty) is the trigger point, which connected to zkserver and katta nodes. katta nodes that deployed solrcores are waiting for querys from tomcat.
          Hide
          JohnWu added a comment -

          TomLiu:

          oh, Thanks!
          for this patch, we overcome some obstacles and share with everyone focus on this feature:

          1) if you want debug the program with katta code, please do not use the katta code from SVN, which lib is so old (lucene is 2.4, hadoop is 0.19, zookeeper is 3.1.1), you need use Git trunk the code and build a eclipse project.
          2) build a WTP project with the TomLiu patched solr in eclipse, and reference above katta project.
          3) copy the zkclient-0.2.dev.jar of katta project to solr project lib (It's so terrible, the exception report less IOItech, that means 101tech)

          debug run!
          ok!

          Tom:
          Can you tell me the conf of solrHome?
          we want to know the solr core deploy in build the proxy process?(if that, the process likes the 1301 patch) or the solr core already exists there, waiting for query?

          Thanks!

          Show
          JohnWu added a comment - TomLiu: oh, Thanks! for this patch, we overcome some obstacles and share with everyone focus on this feature: 1) if you want debug the program with katta code, please do not use the katta code from SVN, which lib is so old (lucene is 2.4, hadoop is 0.19, zookeeper is 3.1.1), you need use Git trunk the code and build a eclipse project. 2) build a WTP project with the TomLiu patched solr in eclipse, and reference above katta project. 3) copy the zkclient-0.2.dev.jar of katta project to solr project lib (It's so terrible, the exception report less IOItech, that means 101tech) debug run! ok! Tom: Can you tell me the conf of solrHome? we want to know the solr core deploy in build the proxy process?(if that, the process likes the 1301 patch) or the solr core already exists there, waiting for query? Thanks!
          Hide
          tom liu added a comment - - edited

          JohnWu:

          my conf is:

          proxy/solrconfig.xml
          <requestHandler name="standard" class="solr.KattaRequestHandler" default="true">
          	<lst name="defaults">
          	<str name="echoParams">explicit</str>
          	<str name="shards">*</str>
          	</lst>
            </requestHandler>
          
          subproxy/solrconfig.xml
          <requestHandler name="standard" class="solr.SearchHandler" default="true">
              <!-- default values for query parameters -->
               <lst name="defaults">
                 <str name="echoParams">explicit</str>
               </lst>
            </requestHandler>
          
          querycore(shards)/solrconfig.xml
          <requestHandler name="standard" class="solr.MultiEmbeddedSearchHandler" default="true">
              <!-- default values for query parameters -->
               <lst name="defaults">
                 <str name="echoParams">explicit</str>
               </lst>
            </requestHandler>
          
          zoo.cfg
          clientPort=2181
          ...
          

          in Katta/conf and Shards/WEB-INF/classes

          katta.zk.properties
          zookeeper.embedded=false
          zookeeper.servers=localhost:2181
          ...
          
          Show
          tom liu added a comment - - edited JohnWu: my conf is: proxy/solrconfig.xml <requestHandler name= "standard" class= "solr.KattaRequestHandler" default= "true" > <lst name= "defaults" > <str name= "echoParams" > explicit </str> <str name= "shards" > * </str> </lst> </requestHandler> subproxy/solrconfig.xml <requestHandler name= "standard" class= "solr.SearchHandler" default= "true" > <!-- default values for query parameters --> <lst name= "defaults" > <str name= "echoParams" > explicit </str> </lst> </requestHandler> querycore(shards)/solrconfig.xml <requestHandler name= "standard" class= "solr.MultiEmbeddedSearchHandler" default= "true" > <!-- default values for query parameters --> <lst name= "defaults" > <str name= "echoParams" > explicit </str> </lst> </requestHandler> zoo.cfg clientPort=2181 ... in Katta/conf and Shards/WEB-INF/classes katta.zk.properties zookeeper.embedded=false zookeeper.servers=localhost:2181 ...
          Hide
          JohnWu added a comment -

          tom liu:

          in proxy's solrconfig.xml
          Can we need add a "saerch" segment of handler as follows?

          <requestHandler name="search" class="solr.KattaRequestHandler" default="true">

          <lst name="defaults">
          <!-- <str name="shards">0,1,2,3,4,5,6,7,8</str> -->

          <str name="shards">
          n-12-0,n-12-1,n-12-2,n-12-3,n-12-4,n-12-5,n-12-6,n-12-7,n-12-8,n-12-9,n-12-10,n-12-11
          </str>
          ...
          </requestHandler>

          it's old configuration of frontend solr, use this configure fileand the zookeeper is unembeded, but the zkclient of 1395patched solr can not find the index deployed by the outter katta!

          within old configure file:
          in zoo.cfg
          clientPort=2181

          but the katta.zk.properties
          zookeeper.clientPort=2182
          It's the reason why our client can not connect with the index of unembeded zookeeper ?

          can you give us a reply?

          Thanks!
          JohnWu

          Show
          JohnWu added a comment - tom liu: in proxy's solrconfig.xml Can we need add a "saerch" segment of handler as follows? <requestHandler name="search" class="solr.KattaRequestHandler" default="true"> <lst name="defaults"> <!-- <str name="shards">0,1,2,3,4,5,6,7,8</str> --> <str name="shards"> n-12-0,n-12-1,n-12-2,n-12-3,n-12-4,n-12-5,n-12-6,n-12-7,n-12-8,n-12-9,n-12-10,n-12-11 </str> ... </requestHandler> it's old configuration of frontend solr, use this configure fileand the zookeeper is unembeded, but the zkclient of 1395patched solr can not find the index deployed by the outter katta! within old configure file: in zoo.cfg clientPort=2181 but the katta.zk.properties zookeeper.clientPort=2182 It's the reason why our client can not connect with the index of unembeded zookeeper ? can you give us a reply? Thanks! JohnWu
          Hide
          tom liu added a comment -

          fixed some bugs:

          1. select?qt=qtname do not supported in SolrKattaServer of subproxy
            in SolrKattaServer, any queryHanlder must be MultiEmbeddedSearchHandler.
            so in solrconfig.xml, we must change solr.SearchHandler to MultiEmbeddedSearchHandler
            for example:
              <searchComponent name="tvComponent" class="solr.TermVectorComponent"/>
              <!-- A Req Handler for working with the tvComponent.  This is purely as an example.
              You will likely want to add the component to your already specified request handlers. -->
              <requestHandler name="tvrh" class="solr.MultiEmbeddedSearchHandler">
                <lst name="defaults">
                  <bool name="tv">true</bool>
                </lst>
                <arr name="last-components">
                  <str>tvComponent</str>
                </arr>
              </requestHandler>
            
          2. TermVectorComponent do not return results
            see https://issues.apache.org/jira/browse/SOLR-2224
          Show
          tom liu added a comment - fixed some bugs: select?qt=qtname do not supported in SolrKattaServer of subproxy in SolrKattaServer, any queryHanlder must be MultiEmbeddedSearchHandler. so in solrconfig.xml, we must change solr.SearchHandler to MultiEmbeddedSearchHandler for example: <searchComponent name="tvComponent" class="solr.TermVectorComponent"/> <!-- A Req Handler for working with the tvComponent. This is purely as an example. You will likely want to add the component to your already specified request handlers. --> <requestHandler name="tvrh" class="solr.MultiEmbeddedSearchHandler"> <lst name="defaults"> <bool name="tv">true</bool> </lst> <arr name="last-components"> <str>tvComponent</str> </arr> </requestHandler> TermVectorComponent do not return results see https://issues.apache.org/jira/browse/SOLR-2224
          Hide
          tom liu added a comment - - edited

          JohnWu,Huang :

          in katta integrations, the solr core has three roles:

          1. proxy, that is query dispatches or front server.
            all query would be sent to this proxy, and then dispatch to subproxy on katta cluster node.
            in this proxy, QueryComponent's distributedProcess would be executed. but the param isShard=false.
          2. subproxy, that is proxy on katta cluster node.
            because each node maybe has more than one cores, so subproxy would receive query from proxy, and send query to any core.
            in this subproxy, QueryComponent's distributedProcess would be executed. but the param isShard=true.
          3. queryCore, that is real query solr core.
            any query would be sent to querycore, and the querycore execute QueryComponent's process method.

          so, when run solr cluster or distribution, we would setup three envs.

          1. proxy's solrconfig.xml
            <requestHandler name="standard" class="solr.KattaRequestHandler" default="true">
                <lst name="defaults">
                    <str name="echoParams">explicit</str>
                    <str name="shards">*</str>
                 </lst>
            </requestHandler>
            
          2. subproxy's solrconfig.xml
            <requestHandler name="standard" class="solr.SearchHandler" default="true">...</requestHandler>
          3. querycore's solrconfig.xml
            <requestHandler name="standard" class="solr.MultiEmbeddedSearchHandler" default="true">...</requestHandler>

          in katta's katta.node.properties::
          node.server.class=org.apache.solr.katta.DeployableSolrKattaServer

          and in classes dirs of proxy's solr webapps
          pls add two files:

          1. katta.zk.properties
          2. katta.node.properties
          Show
          tom liu added a comment - - edited JohnWu,Huang : in katta integrations, the solr core has three roles: proxy, that is query dispatches or front server. all query would be sent to this proxy, and then dispatch to subproxy on katta cluster node. in this proxy, QueryComponent's distributedProcess would be executed. but the param isShard=false. subproxy, that is proxy on katta cluster node. because each node maybe has more than one cores, so subproxy would receive query from proxy, and send query to any core. in this subproxy, QueryComponent's distributedProcess would be executed. but the param isShard=true. queryCore, that is real query solr core. any query would be sent to querycore, and the querycore execute QueryComponent's process method. so, when run solr cluster or distribution, we would setup three envs. proxy's solrconfig.xml <requestHandler name="standard" class="solr.KattaRequestHandler" default="true"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="shards">*</str> </lst> </requestHandler> subproxy's solrconfig.xml <requestHandler name="standard" class="solr.SearchHandler" default="true">...</requestHandler> querycore's solrconfig.xml <requestHandler name="standard" class="solr.MultiEmbeddedSearchHandler" default="true">...</requestHandler> in katta's katta.node.properties:: node.server.class=org.apache.solr.katta.DeployableSolrKattaServer and in classes dirs of proxy's solr webapps pls add two files: katta.zk.properties katta.node.properties
          Hide
          JohnWu added a comment -

          TomLiu:
          I correct the code ater patched:
          correct the code:
          1)package org.apache.solr.client.solrj.request-> QueryRequest ->query (filed):
          protected SolrParams query;

          2)package org.apache.solr.handler.component->SearchHandler->Submit (method):
          void submit(final ShardRequest sreq, final String shard, final ModifiableSolrParams params) {

          Callable<ShardResponse> task = new Callable<ShardResponse>() {
          public ShardResponse call() throws Exception {

          ShardResponse srsp = new ShardResponse();
          srsp.setShardRequest(sreq);
          srsp.setShard(shard);
          SimpleSolrResponse ssr = new SimpleSolrResponse();
          srsp.setSolrResponse(ssr);
          long startTime = System.currentTimeMillis();

          try

          { // String url = "http://" + shard + "/select"; String url = SearchHandler.scheme + shard; params.remove(CommonParams.WT); // use default (currently javabin) params.remove(CommonParams.VERSION); SolrServer server = new CommonsHttpSolrServer(url, client); // SolrRequest req = new QueryRequest(SolrRequest.METHOD.POST, "/select"); // use generic request to avoid extra processing of queries QueryRequest req = new QueryRequest(params); req.setMethod(SolrRequest.METHOD.POST); // no need to set the response parser as binary is the default // req.setResponseParser(new BinaryResponseParser()); // srsp.rsp = server.request(req); // srsp.rsp = server.query(sreq.params); ssr.nl = server.request(req); }

          catch (Throwable th) {
          srsp.setException(th);
          if (th instanceof SolrException)

          { srsp.setResponseCode(((SolrException)th).code()); }

          else

          { srsp.setResponseCode(-1); }

          }

          ssr.elapsedTime = System.currentTimeMillis() - startTime;

          return srsp;
          }
          };

          pending.add( completionService.submit(task) );
          }

          After
          ant compile
          in Solr/dist:
          apache-solr-4.0-SNAPSHOT.war apache-solr-dataimporthandler-4.0-SNAPSHOT.jar
          apache-solr-analysis-extras-4.0-SNAPSHOT.jar apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
          apache-solr-cell-4.0-SNAPSHOT.jar apache-solr-solrj-4.0-SNAPSHOT.jar
          apache-solr-clustering-4.0-SNAPSHOT.jar solrj-lib
          apache-solr-core-4.0-SNAPSHOT.jar

          now I forget the which patch crate a katta and katta proxy folder in example:

          drwxrwxr-x 3 hadoop hadoop 4096 Nov 3 22:53 kattaproxy
          drwxrwxr-x 3 hadoop hadoop 4096 Nov 3 22:53 katta
          drwxrwxr-x 3 hadoop hadoop 4096 Oct 16 03:28 exampledocs
          drwxrwxr-x 5 hadoop hadoop 4096 Oct 16 03:31 example-DIH

          which folder for katta ? how to set the SolrKattaServer in core container?
          There are some test program, which can to refer?

          Thanks for your help!

          Show
          JohnWu added a comment - TomLiu: I correct the code ater patched: correct the code: 1)package org.apache.solr.client.solrj.request-> QueryRequest ->query (filed): protected SolrParams query; 2)package org.apache.solr.handler.component->SearchHandler->Submit (method): void submit(final ShardRequest sreq, final String shard, final ModifiableSolrParams params) { Callable<ShardResponse> task = new Callable<ShardResponse>() { public ShardResponse call() throws Exception { ShardResponse srsp = new ShardResponse(); srsp.setShardRequest(sreq); srsp.setShard(shard); SimpleSolrResponse ssr = new SimpleSolrResponse(); srsp.setSolrResponse(ssr); long startTime = System.currentTimeMillis(); try { // String url = "http://" + shard + "/select"; String url = SearchHandler.scheme + shard; params.remove(CommonParams.WT); // use default (currently javabin) params.remove(CommonParams.VERSION); SolrServer server = new CommonsHttpSolrServer(url, client); // SolrRequest req = new QueryRequest(SolrRequest.METHOD.POST, "/select"); // use generic request to avoid extra processing of queries QueryRequest req = new QueryRequest(params); req.setMethod(SolrRequest.METHOD.POST); // no need to set the response parser as binary is the default // req.setResponseParser(new BinaryResponseParser()); // srsp.rsp = server.request(req); // srsp.rsp = server.query(sreq.params); ssr.nl = server.request(req); } catch (Throwable th) { srsp.setException(th); if (th instanceof SolrException) { srsp.setResponseCode(((SolrException)th).code()); } else { srsp.setResponseCode(-1); } } ssr.elapsedTime = System.currentTimeMillis() - startTime; return srsp; } }; pending.add( completionService.submit(task) ); } After ant compile in Solr/dist: apache-solr-4.0-SNAPSHOT.war apache-solr-dataimporthandler-4.0-SNAPSHOT.jar apache-solr-analysis-extras-4.0-SNAPSHOT.jar apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar apache-solr-cell-4.0-SNAPSHOT.jar apache-solr-solrj-4.0-SNAPSHOT.jar apache-solr-clustering-4.0-SNAPSHOT.jar solrj-lib apache-solr-core-4.0-SNAPSHOT.jar now I forget the which patch crate a katta and katta proxy folder in example: drwxrwxr-x 3 hadoop hadoop 4096 Nov 3 22:53 kattaproxy drwxrwxr-x 3 hadoop hadoop 4096 Nov 3 22:53 katta drwxrwxr-x 3 hadoop hadoop 4096 Oct 16 03:28 exampledocs drwxrwxr-x 5 hadoop hadoop 4096 Oct 16 03:31 example-DIH which folder for katta ? how to set the SolrKattaServer in core container? There are some test program, which can to refer? Thanks for your help!
          Hide
          wilson huang added a comment -

          Tom:
          I have some questions:
          1. How to configure the patched solr ?
          2. How to start up Katta after solr patched?
          a) start the master and the nodes of Katta from the commond line or the Katta as a component in solr

          3. How to use indices whitch created on Katta in your solr cluster?

          Thanks!

          Show
          wilson huang added a comment - Tom: I have some questions: 1. How to configure the patched solr ? 2. How to start up Katta after solr patched? a) start the master and the nodes of Katta from the commond line or the Katta as a component in solr 3. How to use indices whitch created on Katta in your solr cluster? Thanks!
          Hide
          Mathias Walter added a comment -

          Hi John,

          why don't you just compare SearchHandler.java.rej with SearchHandler.java and merge them manually? It should be very easy.

          Show
          Mathias Walter added a comment - Hi John, why don't you just compare SearchHandler.java.rej with SearchHandler.java and merge them manually? It should be very easy.
          Hide
          JohnWu added a comment -

          Tomliu:

          thanks your answer:"solr-1395-katta-0.6.2*.patch included solr-1395-1431.patch"

          I trunk the code form http://svn.apache.org/repos/asf/lucene/dev/trunk

          the folder now is:

          [hadoop@pc-master solrNewTrunkForPatch-solr-1395-katta-0.6.2-1patch]$ ls
          build.xml lucene modules solr

          cd solr/src/java

          patch -p0 -i solr-1395-katta-0.6.2-2.patch

          patching file net/sf/katta/util/IndexConfiguration.java
          patching file org/apache/solr/katta/ISolrDocumentFactory.java
          patching file org/apache/solr/katta/ISolrServer.java
          patching file org/apache/solr/katta/KattaClient.java
          patching file org/apache/solr/katta/KattaResponse.java
          patching file org/apache/solr/katta/ZipService.java
          patching file org/apache/solr/katta/KattaMultiServer.java
          patching file org/apache/solr/katta/KattaComponent.java
          patching file org/apache/solr/katta/DocumentWritable.java
          patching file org/apache/solr/katta/KattaSearchHandler.java
          patching file org/apache/solr/katta/SolrKattaServer.java
          patching file org/apache/solr/katta/DeployableSolrKattaServer.java
          patching file org/apache/solr/katta/KattaRequest.java
          patching file org/apache/solr/katta/SolrIndexer.java
          patching file org/apache/solr/handler/KattaRequestHandler.java
          patching file org/apache/solr/handler/component/SearchHandler.java
          Hunk #1 succeeded at 251 (offset 21 lines).
          Hunk #3 succeeded at 337 (offset 22 lines).
          Hunk #4 FAILED at 379.
          1 out of 4 hunks FAILED – saving rejects to file org/apache/solr/handler/component/SearchHandler.java.rej
          patching file org/apache/solr/handler/component/FacetComponent.java
          Hunk #2 succeeded at 203 (offset 19 lines).
          patching file org/apache/solr/handler/component/SimpleSolrResponse.java
          patching file org/apache/solr/handler/component/DebugComponent.java
          patching file org/apache/solr/handler/component/MultiShardHandler.java
          patching file org/apache/solr/handler/component/HttpMultiShardHandler.java
          patching file org/apache/solr/handler/component/StatsComponent.java
          patching file org/apache/solr/handler/component/EmbeddedMultiShardHandler.java
          patching file org/apache/solr/handler/component/QueryComponent.java
          Hunk #1 succeeded at 287 (offset 112 lines).
          Hunk #2 succeeded at 387 (offset 4 lines).
          Hunk #3 succeeded at 543 (offset 112 lines).
          Hunk #4 succeeded at 480 (offset 4 lines).
          Hunk #5 succeeded at 702 (offset 112 lines).
          Hunk #6 succeeded at 644 (offset 4 lines).
          Hunk #7 succeeded at 808 (offset 112 lines).
          Hunk #8 succeeded at 720 (offset 4 lines).
          patching file org/apache/solr/handler/component/MultiEmbeddedSearchHandler.java
          patching file org/apache/solr/handler/component/HighlightComponent.java
          patching file org/apache/solr/handler/component/AbstractMultiShardHandler.java

          how to solve the problem as above showed?
          "Hunk #4 FAILED at 379.
          1 out of 4 hunks FAILED – saving rejects to file org/apache/solr/handler/component/SearchHandler.java.rej
          "

          we just lack such a step, the only one step to run the cluster as you!

          waiting for your reply! online!

          Show
          JohnWu added a comment - Tomliu: thanks your answer:"solr-1395-katta-0.6.2*.patch included solr-1395-1431.patch" I trunk the code form http://svn.apache.org/repos/asf/lucene/dev/trunk the folder now is: [hadoop@pc-master solrNewTrunkForPatch-solr-1395-katta-0.6.2-1patch] $ ls build.xml lucene modules solr cd solr/src/java patch -p0 -i solr-1395-katta-0.6.2-2.patch patching file net/sf/katta/util/IndexConfiguration.java patching file org/apache/solr/katta/ISolrDocumentFactory.java patching file org/apache/solr/katta/ISolrServer.java patching file org/apache/solr/katta/KattaClient.java patching file org/apache/solr/katta/KattaResponse.java patching file org/apache/solr/katta/ZipService.java patching file org/apache/solr/katta/KattaMultiServer.java patching file org/apache/solr/katta/KattaComponent.java patching file org/apache/solr/katta/DocumentWritable.java patching file org/apache/solr/katta/KattaSearchHandler.java patching file org/apache/solr/katta/SolrKattaServer.java patching file org/apache/solr/katta/DeployableSolrKattaServer.java patching file org/apache/solr/katta/KattaRequest.java patching file org/apache/solr/katta/SolrIndexer.java patching file org/apache/solr/handler/KattaRequestHandler.java patching file org/apache/solr/handler/component/SearchHandler.java Hunk #1 succeeded at 251 (offset 21 lines). Hunk #3 succeeded at 337 (offset 22 lines). Hunk #4 FAILED at 379. 1 out of 4 hunks FAILED – saving rejects to file org/apache/solr/handler/component/SearchHandler.java.rej patching file org/apache/solr/handler/component/FacetComponent.java Hunk #2 succeeded at 203 (offset 19 lines). patching file org/apache/solr/handler/component/SimpleSolrResponse.java patching file org/apache/solr/handler/component/DebugComponent.java patching file org/apache/solr/handler/component/MultiShardHandler.java patching file org/apache/solr/handler/component/HttpMultiShardHandler.java patching file org/apache/solr/handler/component/StatsComponent.java patching file org/apache/solr/handler/component/EmbeddedMultiShardHandler.java patching file org/apache/solr/handler/component/QueryComponent.java Hunk #1 succeeded at 287 (offset 112 lines). Hunk #2 succeeded at 387 (offset 4 lines). Hunk #3 succeeded at 543 (offset 112 lines). Hunk #4 succeeded at 480 (offset 4 lines). Hunk #5 succeeded at 702 (offset 112 lines). Hunk #6 succeeded at 644 (offset 4 lines). Hunk #7 succeeded at 808 (offset 112 lines). Hunk #8 succeeded at 720 (offset 4 lines). patching file org/apache/solr/handler/component/MultiEmbeddedSearchHandler.java patching file org/apache/solr/handler/component/HighlightComponent.java patching file org/apache/solr/handler/component/AbstractMultiShardHandler.java how to solve the problem as above showed? "Hunk #4 FAILED at 379. 1 out of 4 hunks FAILED – saving rejects to file org/apache/solr/handler/component/SearchHandler.java.rej " we just lack such a step, the only one step to run the cluster as you! waiting for your reply! online!
          Hide
          tom liu added a comment -

          JohnWu:

          pls use solr-1395-katta-0.6.2.patch.

          i did not know how to make a patch from solr-1395-1431.patch.
          solr-1395-katta-0.6.2*.patch included solr-1395-1431.patch

          Show
          tom liu added a comment - JohnWu: pls use solr-1395-katta-0.6.2.patch. i did not know how to make a patch from solr-1395-1431.patch. solr-1395-katta-0.6.2*.patch included solr-1395-1431.patch
          Hide
          tom liu added a comment -

          i fixed below bugs:

          1. RPC Server stopping
          2. Rpc client receive null docs
          3. Rpc Client request timeout, that solr would receive null docs

          BTW::
          Walter, i found if change server and client communications that like client send request to server, the NPE would not throw.
          see: https://issues.apache.org/jira/browse/HADOOP-7017

          Show
          tom liu added a comment - i fixed below bugs: RPC Server stopping Rpc client receive null docs Rpc Client request timeout, that solr would receive null docs BTW:: Walter, i found if change server and client communications that like client send request to server, the NPE would not throw. see: https://issues.apache.org/jira/browse/HADOOP-7017
          Hide
          JohnWu added a comment -

          Tom Liu:

          I trunk the code from the http://svn.apache.org/repos/asf/lucene/dev/trunk lucene
          when patch it with solr-1395-1431-4.patch, there are something wrong as follows:

          Hunk # FAILED

          Did I patch the code with solr-1395-1431.patch firstly?

          If I patch the code with solr-1395-katta-0.6.2.patch and katta(also patched), ant build sucessful!

          but you give a solr-1395-katta-0.6.2-1.patch now! How to keep the code syn as yours?
          please tell us the whole patch process. We work together to solve this problem!

          Thanks!

          Show
          JohnWu added a comment - Tom Liu: I trunk the code from the http://svn.apache.org/repos/asf/lucene/dev/trunk lucene when patch it with solr-1395-1431-4.patch, there are something wrong as follows: Hunk # FAILED Did I patch the code with solr-1395-1431.patch firstly? If I patch the code with solr-1395-katta-0.6.2.patch and katta(also patched), ant build sucessful! but you give a solr-1395-katta-0.6.2-1.patch now! How to keep the code syn as yours? please tell us the whole patch process. We work together to solve this problem! Thanks!
          Hide
          tom liu added a comment -

          Walter, thanks.
          i review codes, found that org.apache.hadoop.ipc.Client class holds connection to ShardNode, but for each node, only one socket/connection.
          so, if large requests are sent, the connection would be wait synchronized.

          i think, for each node, it would have some connections.

          Show
          tom liu added a comment - Walter, thanks. i review codes, found that org.apache.hadoop.ipc.Client class holds connection to ShardNode, but for each node, only one socket/connection. so, if large requests are sent, the connection would be wait synchronized. i think, for each node, it would have some connections.
          Hide
          Mathias Walter added a comment -

          Tom, thats what I reported on 18th of August and why I switched to the Katta distribution system.

          Show
          Mathias Walter added a comment - Tom, thats what I reported on 18th of August and why I switched to the Katta distribution system.
          Hide
          tom liu added a comment -

          Concurrency request would be thrown NPE.
          Such as:

          ab -n 10000 -c 5 http://solr01:8080/solr/select?q=solr&...
          

          it would be thrown NPE:

          10/10/26 17:36:03 TRACE client.WorkQueue:261 - Done waiting, results = ClientResult: 0 results, 0 errors, 0/2 shards (id=2359:0)
          10/10/26 17:36:03 TRACE client.WorkQueue:270 - Shutting down work queue, results = ClientResult: 0 results, 0 errors, 0/2 shards (id=2359:0)
          10/10/26 17:36:03 TRACE client.ClientResult:286 - close() called.
          10/10/26 17:36:03 TRACE client.ClientResult:290 - Notifying closed listener.
          10/10/26 17:36:03 TRACE client.WorkQueue:136 - Shut down via ClientRequest.close()
          10/10/26 17:36:03 TRACE client.WorkQueue:188 - Shutdown() called (id=2359)
          10/10/26 17:36:03 TRACE client.WorkQueue:277 - Returning results = ClientResult: 0 results, 0 errors, 0/2 shards (closed), took 10003 ms (id=2359:0)
          10/10/26 17:36:03 DEBUG client.Client:427 - broadcast(request([Ljava.lang.Object;@7cf02bee), {solr03:20000=[solrhome01#solrhome01, solrhome02#solrhome02]}) took 10004 msec for ClientResult: 0 results, 0 errors, 0/2 shards (closed)
          10/10/26 17:36:03 INFO component.SearchHandler:89 - KattaCommComponent results.size: 0
          10/10/26 17:36:03 WARN component.SearchHandler:93 - Received 0 responses for query [], not 1
          10/10/26 17:36:03 ERROR core.SolrCore:151 - java.lang.NullPointerException
                  at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:553)
                  at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:435)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:304)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          

          But use

          ab -n 10000 -c 1 http://solr01:8080/solr/select?q=solr&... 
          

          do not thrown

          BTW::
          NPE would stop RPC communication in request method of SolrKattaServer.java

          Show
          tom liu added a comment - Concurrency request would be thrown NPE. Such as: ab -n 10000 -c 5 http://solr01:8080/solr/select?q=solr&... it would be thrown NPE: 10/10/26 17:36:03 TRACE client.WorkQueue:261 - Done waiting, results = ClientResult: 0 results, 0 errors, 0/2 shards (id=2359:0) 10/10/26 17:36:03 TRACE client.WorkQueue:270 - Shutting down work queue, results = ClientResult: 0 results, 0 errors, 0/2 shards (id=2359:0) 10/10/26 17:36:03 TRACE client.ClientResult:286 - close() called. 10/10/26 17:36:03 TRACE client.ClientResult:290 - Notifying closed listener. 10/10/26 17:36:03 TRACE client.WorkQueue:136 - Shut down via ClientRequest.close() 10/10/26 17:36:03 TRACE client.WorkQueue:188 - Shutdown() called (id=2359) 10/10/26 17:36:03 TRACE client.WorkQueue:277 - Returning results = ClientResult: 0 results, 0 errors, 0/2 shards (closed), took 10003 ms (id=2359:0) 10/10/26 17:36:03 DEBUG client.Client:427 - broadcast(request([Ljava.lang.Object;@7cf02bee), {solr03:20000=[solrhome01#solrhome01, solrhome02#solrhome02]}) took 10004 msec for ClientResult: 0 results, 0 errors, 0/2 shards (closed) 10/10/26 17:36:03 INFO component.SearchHandler:89 - KattaCommComponent results.size: 0 10/10/26 17:36:03 WARN component.SearchHandler:93 - Received 0 responses for query [], not 1 10/10/26 17:36:03 ERROR core.SolrCore:151 - java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:553) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:435) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:304) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) But use ab -n 10000 -c 1 http://solr01:8080/solr/select?q=solr&... do not thrown BTW:: NPE would stop RPC communication in request method of SolrKattaServer.java
          Hide
          tom liu added a comment -

          fixed some bugs:

          1. NPE throws in QueryComponent's mergeIds method
            • add: shardDoc.score = 0f;
          2. Dups query in sub shards
            • changes queryComponent's distributedProcess
            • changes queryComponent's handleResponses
            • changes queryComponent's createMainQuery
            • changes queryComponent's returnFields
          3. No results when start!=0
            • changes queryComponent's createRetrieveDocs
          4. more results returned in KattaMultiShardHandler
            • changes KattaMultiShardHandler's execute
          5. IndexConfiguration
          6. Components throw NPE
            • DebugComponent
            • FacetComponent
            • HighlightComponent
            • StatsComponent

          My patch is based on solr-1395-1431-4.patch:

          1. svn co http://svn.apache.org/repos/asf/lucene/dev/trunk lucene
          2. patch solr-1395-1431-4.patch
          3. ...... (do some changes, such as fixed above bugs)
          4. svn di > solr-1395-katta-0.6.2-1.patch
          Show
          tom liu added a comment - fixed some bugs: NPE throws in QueryComponent's mergeIds method add: shardDoc.score = 0f; Dups query in sub shards changes queryComponent's distributedProcess changes queryComponent's handleResponses changes queryComponent's createMainQuery changes queryComponent's returnFields No results when start!=0 changes queryComponent's createRetrieveDocs more results returned in KattaMultiShardHandler changes KattaMultiShardHandler's execute IndexConfiguration Components throw NPE DebugComponent FacetComponent HighlightComponent StatsComponent My patch is based on solr-1395-1431-4.patch: svn co http://svn.apache.org/repos/asf/lucene/dev/trunk lucene patch solr-1395-1431-4.patch ...... (do some changes, such as fixed above bugs) svn di > solr-1395-katta-0.6.2-1.patch
          Hide
          Mathias Walter added a comment -

          Why did you include SolrIndexer.java, ZipService.java, DocumentWritable.java, KattaComponent.java and ISolrDocumentFactory.java? Did you merged in another patch? Please keep them separated.

          BTW: SolrIndexer uses IndexConfiguration from Katta, which is no longer available.

          Show
          Mathias Walter added a comment - Why did you include SolrIndexer.java, ZipService.java, DocumentWritable.java, KattaComponent.java and ISolrDocumentFactory.java? Did you merged in another patch? Please keep them separated. BTW: SolrIndexer uses IndexConfiguration from Katta, which is no longer available.
          Hide
          Mathias Walter added a comment - - edited

          You should not include your modified web.xml into the patch.

          Show
          Mathias Walter added a comment - - edited You should not include your modified web.xml into the patch.
          Hide
          tom liu added a comment -

          When i use debugQuery=true to query solr, NPE throws.
          I found all subclass of SearchComponent have defects, Because i changed QueryComponent that do not query twice in middle solrs, but any component depends on twice query.

          So, i would change SearchComponent's subclass, such as DebugComponent.

          Show
          tom liu added a comment - When i use debugQuery=true to query solr, NPE throws. I found all subclass of SearchComponent have defects, Because i changed QueryComponent that do not query twice in middle solrs, but any component depends on twice query. So, i would change SearchComponent's subclass, such as DebugComponent.
          Hide
          tom liu added a comment -

          I am sorry,
          The solr-1395-katta-0.6.2.patch based on:

          1. solr's newest trunk
          2. hadoop-0.20.2
          3. katta-0.6.2
          4. zookeeper-3.3.1
          Show
          tom liu added a comment - I am sorry, The solr-1395-katta-0.6.2.patch based on: solr's newest trunk svn co http://svn.apache.org/repos/asf/lucene/dev/trunk lucene hadoop-0.20.2 katta-0.6.2 zookeeper-3.3.1
          Hide
          Dhruv Bansal added a comment -

          tom liu,

          i'm unable to run the last patch file you uploaded, solr-1395-katta-0.6.2.patch on 2010-10-11, using apache-solr-1.4.1.

          patching_solr-1.4.1
          $ wget http://mirror.cloudera.com/apache//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
          ...
          $ tar -xzf apache-solr-1.4.1.tgz
          $ cd apache-solr-1.4.1/src
          apache-solr-1.4.1/src$ wget https://issues.apache.org/jira/secure/attachment/12456924/solr-1395-katta-0.6.2.patch
          ...
          apache-solr-1.4.1/src$ patch -p0 -i solr-1395-katta-0.6.2.patch
          
          patching file java/org/apache/solr/katta/ISolrDocumentFactory.java
          patching file java/org/apache/solr/katta/ISolrServer.java
          patching file java/org/apache/solr/katta/KattaClient.java
          patching file java/org/apache/solr/katta/KattaResponse.java
          patching file java/org/apache/solr/katta/ZipService.java
          patching file java/org/apache/solr/katta/KattaMultiServer.java
          patching file java/org/apache/solr/katta/KattaComponent.java
          patching file java/org/apache/solr/katta/DocumentWritable.java
          Patching file java/org/apache/solr/katta/KattaSearchHandler.java
          patching file java/org/apache/solr/katta/SolrKattaServer.java
          patching file java/org/apache/solr/katta/DeployableSolrKattaServer.java
          patching file java/org/apache/solr/katta/KattaRequest.java
          patching file java/org/apache/solr/katta/SolrIndexer.java
          patching file java/org/apache/solr/handler/KattaRequestHandler.java
          patching file java/org/apache/solr/handler/component/SearchHandler.java
          Hunk #1 succeeded at 216 (offset -14 lines).
          Hunk #2 succeeded at 243 (offset -14 lines).
          Hunk #3 succeeded at 301 (offset -14 lines).
          Hunk #4 FAILED at 357.
          1 out of 4 hunks FAILED -- saving rejects to file java/org/apache/solr/handler/component/SearchHandler.java.rej
          patching file java/org/apache/solr/handler/component/SimpleSolrResponse.java
          patching file java/org/apache/solr/handler/component/MultiShardHandler.java
          patching file java/org/apache/solr/handler/component/HttpMultiShardHandler.java
          patching file java/org/apache/solr/handler/component/EmbeddedMultiShardHandler.java
          patching file java/org/apache/solr/handler/component/QueryComponent.java
          Hunk #1 succeeded at 42 with fuzz 2 (offset 6 lines).
          Hunk #2 succeeded at 174 (offset -2 lines).
          Hunk #3 succeeded at 274 (offset -110 lines).
          Hunk #4 FAILED at 432.
          Hunk #5 succeeded at 361 (offset -109 lines).
          Hunk #6 succeeded at 474 (offset -110 lines).
          Hunk #7 succeeded at 525 (offset -110 lines).
          Hunk #8 succeeded at 576 (offset -115 lines).
          Hunk #9 FAILED at 711.
          2 out of 9 hunks FAILED -- saving rejects to file java/org/apache/solr/handler/component/QueryComponent.java.rej
          patching file java/org/apache/solr/handler/component/MultiEmbeddedSearchHandler.java
          patching file java/org/apache/solr/handler/component/AbstractMultiShardHandler.java
          patching file solrj/org/apache/solr/client/solrj/request/QueryRequest.java
          patching file webapp/web/WEB-INF/web.xml
          

          Did I do something incorrectly?

          Show
          Dhruv Bansal added a comment - tom liu, i'm unable to run the last patch file you uploaded, solr-1395-katta-0.6.2.patch on 2010-10-11, using apache-solr-1.4.1. patching_solr-1.4.1 $ wget http: //mirror.cloudera.com/apache//lucene/solr/1.4.1/apache-solr-1.4.1.tgz ... $ tar -xzf apache-solr-1.4.1.tgz $ cd apache-solr-1.4.1/src apache-solr-1.4.1/src$ wget https: //issues.apache.org/jira/secure/attachment/12456924/solr-1395-katta-0.6.2.patch ... apache-solr-1.4.1/src$ patch -p0 -i solr-1395-katta-0.6.2.patch patching file java/org/apache/solr/katta/ISolrDocumentFactory.java patching file java/org/apache/solr/katta/ISolrServer.java patching file java/org/apache/solr/katta/KattaClient.java patching file java/org/apache/solr/katta/KattaResponse.java patching file java/org/apache/solr/katta/ZipService.java patching file java/org/apache/solr/katta/KattaMultiServer.java patching file java/org/apache/solr/katta/KattaComponent.java patching file java/org/apache/solr/katta/DocumentWritable.java Patching file java/org/apache/solr/katta/KattaSearchHandler.java patching file java/org/apache/solr/katta/SolrKattaServer.java patching file java/org/apache/solr/katta/DeployableSolrKattaServer.java patching file java/org/apache/solr/katta/KattaRequest.java patching file java/org/apache/solr/katta/SolrIndexer.java patching file java/org/apache/solr/handler/KattaRequestHandler.java patching file java/org/apache/solr/handler/component/SearchHandler.java Hunk #1 succeeded at 216 (offset -14 lines). Hunk #2 succeeded at 243 (offset -14 lines). Hunk #3 succeeded at 301 (offset -14 lines). Hunk #4 FAILED at 357. 1 out of 4 hunks FAILED -- saving rejects to file java/org/apache/solr/handler/component/SearchHandler.java.rej patching file java/org/apache/solr/handler/component/SimpleSolrResponse.java patching file java/org/apache/solr/handler/component/MultiShardHandler.java patching file java/org/apache/solr/handler/component/HttpMultiShardHandler.java patching file java/org/apache/solr/handler/component/EmbeddedMultiShardHandler.java patching file java/org/apache/solr/handler/component/QueryComponent.java Hunk #1 succeeded at 42 with fuzz 2 (offset 6 lines). Hunk #2 succeeded at 174 (offset -2 lines). Hunk #3 succeeded at 274 (offset -110 lines). Hunk #4 FAILED at 432. Hunk #5 succeeded at 361 (offset -109 lines). Hunk #6 succeeded at 474 (offset -110 lines). Hunk #7 succeeded at 525 (offset -110 lines). Hunk #8 succeeded at 576 (offset -115 lines). Hunk #9 FAILED at 711. 2 out of 9 hunks FAILED -- saving rejects to file java/org/apache/solr/handler/component/QueryComponent.java.rej patching file java/org/apache/solr/handler/component/MultiEmbeddedSearchHandler.java patching file java/org/apache/solr/handler/component/AbstractMultiShardHandler.java patching file solrj/org/apache/solr/client/solrj/request/QueryRequest.java patching file webapp/web/WEB-INF/web.xml Did I do something incorrectly?
          Hide
          tom liu added a comment -

          fixed some bugs:

          1. NPE throws in QueryComponent's mergeIds method
            • add: shardDoc.score = 0f;
          2. Dups query in sub shards
            • changes queryComponent's distributedProcess
            • changes queryComponent's handleResponses
            • changes queryComponent's createMainQuery
            • changes queryComponent's returnFields
          3. No results when start!=0
            • changes queryComponent's createRetrieveDocs
          4. more results returned in KattaMultiShardHandler
            • changes KattaMultiShardHandler's execute
          Show
          tom liu added a comment - fixed some bugs: NPE throws in QueryComponent's mergeIds method add: shardDoc.score = 0f; Dups query in sub shards changes queryComponent's distributedProcess changes queryComponent's handleResponses changes queryComponent's createMainQuery changes queryComponent's returnFields No results when start!=0 changes queryComponent's createRetrieveDocs more results returned in KattaMultiShardHandler changes KattaMultiShardHandler's execute
          Hide
          tom liu added a comment -

          No, that's four queries:

          1. on solr01, url is /select?fl=id,score&...
            • Shard=solrhome02#solrhome02
            • Shard=solrhome01#solrhome01
          2. on solr01, url is /select?ids=SOLR1000&fl=id,score,id&...
            • Shard=solrhome02#solrhome02
          3. on solr02, url is /select?ids=SOLR1000&fl=id,score&...
            • Shard=solrhome02#solrhome02
            • Shard=solrhome01#solrhome01
          4. on solr02, url is /select?ids=SOLR1000&ids=SOLR1000&...
            • Shard=solrhome01#solrhome01

          If the orient query includes shards=*, then master solr would send * to kattaclient.
          And then, kattaclient or katta.Client would select node such as solr01, and send shards=solrhome01#solrhome01,solrhome02#solrhome02
          in middle-shard, searchHandler and queryComponent would invoke distributed process, such as createMainQuery and createRetrieveDocs.
          So, in any node, the query would be distributed into two queries:

          1. first is selecting id and score
          2. second is selecting docs

          i have changed the queryComponent class. that is:

          distributedProcess
          	// Added by tom liu
          	// do or not need distributed process
          	boolean isShard = rb.req.getParams().getBool(ShardParams.IS_SHARD, false);
          	// if in sub shards, do not need distributed process
          	if (isShard) {
          		if (rb.stage < ResponseBuilder.STAGE_PARSE_QUERY)
          			return ResponseBuilder.STAGE_PARSE_QUERY;
          		if (rb.stage == ResponseBuilder.STAGE_PARSE_QUERY) {
          			createDistributedIdf(rb);
          			return ResponseBuilder.STAGE_EXECUTE_QUERY;
          		}
          		if (rb.stage < ResponseBuilder.STAGE_EXECUTE_QUERY)
          			return ResponseBuilder.STAGE_EXECUTE_QUERY;
          		if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) {
          			createMainQuery(rb);
          			return ResponseBuilder.STAGE_GET_FIELDS;
          		}
          		if (rb.stage < ResponseBuilder.STAGE_GET_FIELDS)
          			return ResponseBuilder.STAGE_GET_FIELDS;
          		if (rb.stage == ResponseBuilder.STAGE_GET_FIELDS) {
          			return ResponseBuilder.STAGE_DONE;
          		}
          		return ResponseBuilder.STAGE_DONE;
          	}
          	// add end
                  ...
          
          handleResponses
            if ((sreq.purpose & ShardRequest.PURPOSE_GET_TOP_IDS) != 0) {
                mergeIds(rb, sreq);
            	  // Added by tom liu
            	  // do or not need distributed process
            	  boolean isShard = rb.req.getParams().getBool(ShardParams.IS_SHARD, false);
                if(isShard){
                	sreq.purpose = ShardRequest.PURPOSE_GET_FIELDS;
                }
             	  // add end
              }
          
              if ((sreq.purpose & ShardRequest.PURPOSE_GET_FIELDS) != 0) {
                returnFields(rb, sreq);
                return;
              }
          
          createMainQuery
              sreq.params = new ModifiableSolrParams(rb.req.getParams());
              // TODO: base on current params or original params?
          
          	// Added by tom liu
          	// do or not need distributed process
          	boolean isShard = rb.req.getParams().getBool(ShardParams.IS_SHARD, false);
              if(isShard){
                  // isShard=true, then do not change params
              }else{
              	// add end
          	    // don't pass through any shards param
          	    sreq.params.remove(ShardParams.SHARDS);
              ...
          
          returnFields
                boolean returnScores = (rb.getFieldFlags() & SolrIndexSearcher.GET_SCORES) != 0;
          
                // changed by tom liu
                // add for loop
                //assert(sreq.responses.size() == 1);
                //ShardResponse srsp = sreq.responses.get(0);
                for(ShardResponse srsp : sreq.responses){
          	      SolrDocumentList docs = (SolrDocumentList)srsp.getSolrResponse().getResponse().get("response");
          
          	      String keyFieldName = rb.req.getSchema().getUniqueKeyField().getName();
                ...
          
          Show
          tom liu added a comment - No, that's four queries: on solr01, url is /select?fl=id,score&... Shard=solrhome02#solrhome02 Shard=solrhome01#solrhome01 on solr01, url is /select?ids=SOLR1000&fl=id,score,id&... Shard=solrhome02#solrhome02 on solr02, url is /select?ids=SOLR1000&fl=id,score&... Shard=solrhome02#solrhome02 Shard=solrhome01#solrhome01 on solr02, url is /select?ids=SOLR1000&ids=SOLR1000&... Shard=solrhome01#solrhome01 If the orient query includes shards=*, then master solr would send * to kattaclient. And then, kattaclient or katta.Client would select node such as solr01, and send shards=solrhome01#solrhome01,solrhome02#solrhome02 in middle-shard, searchHandler and queryComponent would invoke distributed process, such as createMainQuery and createRetrieveDocs. So, in any node, the query would be distributed into two queries: first is selecting id and score second is selecting docs i have changed the queryComponent class. that is: distributedProcess // Added by tom liu // do or not need distributed process boolean isShard = rb.req.getParams().getBool(ShardParams.IS_SHARD, false ); // if in sub shards, do not need distributed process if (isShard) { if (rb.stage < ResponseBuilder.STAGE_PARSE_QUERY) return ResponseBuilder.STAGE_PARSE_QUERY; if (rb.stage == ResponseBuilder.STAGE_PARSE_QUERY) { createDistributedIdf(rb); return ResponseBuilder.STAGE_EXECUTE_QUERY; } if (rb.stage < ResponseBuilder.STAGE_EXECUTE_QUERY) return ResponseBuilder.STAGE_EXECUTE_QUERY; if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) { createMainQuery(rb); return ResponseBuilder.STAGE_GET_FIELDS; } if (rb.stage < ResponseBuilder.STAGE_GET_FIELDS) return ResponseBuilder.STAGE_GET_FIELDS; if (rb.stage == ResponseBuilder.STAGE_GET_FIELDS) { return ResponseBuilder.STAGE_DONE; } return ResponseBuilder.STAGE_DONE; } // add end ... handleResponses if ((sreq.purpose & ShardRequest.PURPOSE_GET_TOP_IDS) != 0) { mergeIds(rb, sreq); // Added by tom liu // do or not need distributed process boolean isShard = rb.req.getParams().getBool(ShardParams.IS_SHARD, false ); if (isShard){ sreq.purpose = ShardRequest.PURPOSE_GET_FIELDS; } // add end } if ((sreq.purpose & ShardRequest.PURPOSE_GET_FIELDS) != 0) { returnFields(rb, sreq); return ; } createMainQuery sreq.params = new ModifiableSolrParams(rb.req.getParams()); // TODO: base on current params or original params? // Added by tom liu // do or not need distributed process boolean isShard = rb.req.getParams().getBool(ShardParams.IS_SHARD, false ); if (isShard){ // isShard= true , then do not change params } else { // add end // don't pass through any shards param sreq.params.remove(ShardParams.SHARDS); ... returnFields boolean returnScores = (rb.getFieldFlags() & SolrIndexSearcher.GET_SCORES) != 0; // changed by tom liu // add for loop // assert (sreq.responses.size() == 1); //ShardResponse srsp = sreq.responses.get(0); for (ShardResponse srsp : sreq.responses){ SolrDocumentList docs = (SolrDocumentList)srsp.getSolrResponse().getResponse().get( "response" ); String keyFieldName = rb.req.getSchema().getUniqueKeyField().getName(); ...
          Hide
          Mathias Walter added a comment -

          The two queries are not identical! In a distributed environment the first query asks each shard for the n document Ids with the top scorces. The master merges them to the final top n ranking documents. Then a second query to each shard is sent which requestes the solr documents by its IDs. That's why you can see "ids=SOLR1000" in the second query.

          Show
          Mathias Walter added a comment - The two queries are not identical! In a distributed environment the first query asks each shard for the n document Ids with the top scorces. The master merges them to the final top n ranking documents. Then a second query to each shard is sent which requestes the solr documents by its IDs. That's why you can see "ids=SOLR1000" in the second query.
          Hide
          tom liu added a comment - - edited

          My deployment is:

          1. one Master
          2. two Slaves:
            • solr01
            • solr02
          3. two Indexes:
            • solrhome01(.zip)
            • solrhome02(.zip)

          And, i use:

          # bin/katta addIndex solrhome01 hdfs://localhost:9000/solr/solrhome01.zip
          # bin/katta addIndex solrhome02 hdfs://localhost:9000/solr/solrhome02.zip
          

          so, my shard-Node is:

          1. solrhome01#solrhome01
            • --solr01
            • --solr02
          2. solrhome02#solrhome02
            • --solr01
            • --solr02

          When i searched in master, i found that, in any slave, the search ran twice, such as:

          SolrServer.request: solr01:20000 shards:[solrhome01#solrhome01, solrhome02#solrhome02] 
          request params:fl=id%2Cscore&start=0&q=solr&isShard=true&fsv=true&rows=10&shards=solrhome01%23solrhome01%2Csolrhome02%23solrhome02
          2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute
          信息: [solrhome02#solrhome02] webapp=null path=/select params={fl=id%2Cscore&start=0&q=solr&isShard=true&fsv=true&rows=10} hits=1 status=0 QTime=16 
          2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute
          信息: [solrhome01#solrhome01] webapp=null path=/select params={fl=id%2Cscore&start=0&q=solr&isShard=true&fsv=true&rows=10} hits=1 status=0 QTime=16 
          2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute
          信息: [solrhome02#solrhome02] webapp=null path=/select params={fl=id%2Cscore%2Cid&start=0&q=solr&isShard=true&rows=10&ids=SOLR1000} status=0 QTime=0 
          SolrServer.SolrResponse:{response={numFound=1,start=00.5747526,docs=[SolrDocument[{id=SOLR1000, score=0.5747526}]]},QueriedShards=[Ljava.lang.String;@175ace6}
          
          SolrServer.request: solr02:20000 shards:[solrhome01#solrhome01, solrhome02#solrhome02] 
          request params:start=0&ids=SOLR1000&q=solr&isShard=true&rows=10&shards=solrhome01%23solrhome01%2Csolrhome02%23solrhome02
          2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute
          信息: [solrhome02#solrhome02] webapp=null path=/select params={start=0&ids=SOLR1000&q=solr&isShard=true&rows=10&fsv=true&fl=id%2Cscore} status=0 QTime=16
          2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute
          信息: [solrhome01#solrhome01] webapp=null path=/select params={start=0&ids=SOLR1000&q=solr&isShard=true&rows=10&fsv=true&fl=id%2Cscore} status=0 QTime=16
          2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute
          信息: [solrhome01#solrhome01] webapp=null path=/select params={start=0&ids=SOLR1000&ids=SOLR1000&q=solr&isShard=true&rows=10} status=0 QTime=0
          SolrServer.SolrResponse:{response={numFound=1,start=0,docs=[SolrDocument[{id=SOLR1000, ...]]},QueriedShards=[Ljava.lang.String;@1d590d}
          

          i think, in slaves, IS_SHARD=true, so, it would prevent this happens.

          Show
          tom liu added a comment - - edited My deployment is: one Master two Slaves: solr01 solr02 two Indexes: solrhome01(.zip) solrhome02(.zip) And, i use: # bin/katta addIndex solrhome01 hdfs://localhost:9000/solr/solrhome01.zip # bin/katta addIndex solrhome02 hdfs://localhost:9000/solr/solrhome02.zip so, my shard-Node is: solrhome01#solrhome01 --solr01 --solr02 solrhome02#solrhome02 --solr01 --solr02 When i searched in master, i found that, in any slave, the search ran twice, such as: SolrServer.request: solr01:20000 shards:[solrhome01#solrhome01, solrhome02#solrhome02] request params:fl=id%2Cscore&start=0&q=solr&isShard=true&fsv=true&rows=10&shards=solrhome01%23solrhome01%2Csolrhome02%23solrhome02 2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute 信息: [solrhome02#solrhome02] webapp=null path=/select params={fl=id%2Cscore&start=0&q=solr&isShard=true&fsv=true&rows=10} hits=1 status=0 QTime=16 2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute 信息: [solrhome01#solrhome01] webapp=null path=/select params={fl=id%2Cscore&start=0&q=solr&isShard=true&fsv=true&rows=10} hits=1 status=0 QTime=16 2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute 信息: [solrhome02#solrhome02] webapp=null path=/select params={fl=id%2Cscore%2Cid&start=0&q=solr&isShard=true&rows=10&ids=SOLR1000} status=0 QTime=0 SolrServer.SolrResponse:{response={numFound=1,start=00.5747526,docs=[SolrDocument[{id=SOLR1000, score=0.5747526}]]},QueriedShards=[Ljava.lang.String;@175ace6} SolrServer.request: solr02:20000 shards:[solrhome01#solrhome01, solrhome02#solrhome02] request params:start=0&ids=SOLR1000&q=solr&isShard=true&rows=10&shards=solrhome01%23solrhome01%2Csolrhome02%23solrhome02 2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute 信息: [solrhome02#solrhome02] webapp=null path=/select params={start=0&ids=SOLR1000&q=solr&isShard=true&rows=10&fsv=true&fl=id%2Cscore} status=0 QTime=16 2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute 信息: [solrhome01#solrhome01] webapp=null path=/select params={start=0&ids=SOLR1000&q=solr&isShard=true&rows=10&fsv=true&fl=id%2Cscore} status=0 QTime=16 2010-10-10 16:17:04 org.apache.solr.core.SolrCore execute 信息: [solrhome01#solrhome01] webapp=null path=/select params={start=0&ids=SOLR1000&ids=SOLR1000&q=solr&isShard=true&rows=10} status=0 QTime=0 SolrServer.SolrResponse:{response={numFound=1,start=0,docs=[SolrDocument[{id=SOLR1000, ...]]},QueriedShards=[Ljava.lang.String;@1d590d} i think, in slaves, IS_SHARD=true, so, it would prevent this happens.
          Hide
          tom liu added a comment - - edited

          i use solr-4.0 newest code trunk, and katta 0.6.2, hadoop-0.20.2, zookeeper-3.3.1, after fixed some bugs , i run it.

          the bugs is :
          1. solr's ShardDoc.java, ShardFieldSortedHitQueue line 210 :

                  final float f1 = e1.score==null?0.00f:e1.score;
                  final float f2 = e2.score==null?0.00f:e2.score;
          

          2. KattaSearchHandler.java, KattaMultiShardHandler may be return more results, so must include any results:

          			if (results.isEmpty()) {
          				ssr.setResponse(new NamedList<Object>());
          				return;
          			}
          +
          +			NamedList<Object> nl = new NamedList<Object>();
          +			NamedListCollection nlc = new NamedListCollection(nl);
          +			for(KattaResponse kr : results){
          +				nl = nlc.add(kr.getRsp().getResponse());
          +			}
          			ssr.setResponse(nl);
                          }
          +		private class NamedListCollection {
          +			private NamedList<Object> _nl;
          +			NamedListCollection(NamedList<Object> nl){
          +				_nl = nl;
          +			}
          +			NamedList<Object> add(NamedList<Object> nl){
          +				Iterator<Entry<String,Object>> it = nl.iterator();
          +				while (it.hasNext()){
          +					Entry<String,Object> entry = it.next();
          +					String key = entry.getKey();
          +					Object obj = entry.getValue();
          +					Object old = _nl.remove(key);
          +					if(old != null){
          +						add(key, obj , old );
          +					}else{
          +						_nl.add(key, obj);
          +					}
          +				}
          +				return _nl;
          +			}
          +			void add(String key,Object obj,Object old){
          +				if(key.equals("response")){
          +					SolrDocumentList doca = (SolrDocumentList)obj;
          +					SolrDocumentList docb = (SolrDocumentList)old;
          +					SolrDocumentList docs = new SolrDocumentList();
          +					docs.setNumFound(doca.getNumFound()+docb.getNumFound());
          +					//doca.setStart(doca.getStart()+docb.getStart());
          +					docs.setMaxScore(Math.max(doca.getMaxScore(), docb.getMaxScore()));
          +					docs.addAll(doca);
          +					docs.addAll(docb);
          +					_nl.add(key,docs);
          +				}else if(key.equals("QueriedShards")){
          +					Collection<String> qsa = (ArrayList<String>)obj;
          +					Collection<String> qsb = (ArrayList<String>)old;
          +					Collection<String> qs = new ArrayList<String>();
          +					qs.addAll(qsa);
          +					qs.addAll(qsb);
          +					_nl.add(key, qs);
          +				}
          +			}
          +		}
          
          Show
          tom liu added a comment - - edited i use solr-4.0 newest code trunk, and katta 0.6.2, hadoop-0.20.2, zookeeper-3.3.1, after fixed some bugs , i run it. the bugs is : 1. solr's ShardDoc.java, ShardFieldSortedHitQueue line 210 : final float f1 = e1.score==null?0.00f:e1.score; final float f2 = e2.score==null?0.00f:e2.score; 2. KattaSearchHandler.java, KattaMultiShardHandler may be return more results, so must include any results: if (results.isEmpty()) { ssr.setResponse(new NamedList<Object>()); return; } + + NamedList<Object> nl = new NamedList<Object>(); + NamedListCollection nlc = new NamedListCollection(nl); + for(KattaResponse kr : results){ + nl = nlc.add(kr.getRsp().getResponse()); + } ssr.setResponse(nl); } + private class NamedListCollection { + private NamedList<Object> _nl; + NamedListCollection(NamedList<Object> nl){ + _nl = nl; + } + NamedList<Object> add(NamedList<Object> nl){ + Iterator<Entry<String,Object>> it = nl.iterator(); + while (it.hasNext()){ + Entry<String,Object> entry = it.next(); + String key = entry.getKey(); + Object obj = entry.getValue(); + Object old = _nl.remove(key); + if(old != null){ + add(key, obj , old ); + }else{ + _nl.add(key, obj); + } + } + return _nl; + } + void add(String key,Object obj,Object old){ + if(key.equals("response")){ + SolrDocumentList doca = (SolrDocumentList)obj; + SolrDocumentList docb = (SolrDocumentList)old; + SolrDocumentList docs = new SolrDocumentList(); + docs.setNumFound(doca.getNumFound()+docb.getNumFound()); + //doca.setStart(doca.getStart()+docb.getStart()); + docs.setMaxScore(Math.max(doca.getMaxScore(), docb.getMaxScore())); + docs.addAll(doca); + docs.addAll(docb); + _nl.add(key,docs); + }else if(key.equals("QueriedShards")){ + Collection<String> qsa = (ArrayList<String>)obj; + Collection<String> qsb = (ArrayList<String>)old; + Collection<String> qs = new ArrayList<String>(); + qs.addAll(qsa); + qs.addAll(qsb); + _nl.add(key, qs); + } + } + }
          Hide
          jianfeng zheng added a comment -

          You are so nice, Mathias

          I am using the MultiShard Distributed Search of Solr, and also let Katta chose a node for a shard. I found there is only one proxy object in KattaClient for each Katta node, lock it will solve the problem you post on 18/Aug. but it will lead to each node works as single thread.

          Show
          jianfeng zheng added a comment - You are so nice, Mathias I am using the MultiShard Distributed Search of Solr, and also let Katta chose a node for a shard. I found there is only one proxy object in KattaClient for each Katta node, lock it will solve the problem you post on 18/Aug. but it will lead to each node works as single thread.
          Hide
          Mathias Walter added a comment -

          I also come into this problem, have you deal with it?

          Yeah, I got it working by using the Katta Distribution Policy instead of the MultiShard Distributed Search of Solr. That does not solve the RPC problem, but avoid it completely. Personally, I think it is better to keep Katta choosing how to distribute the queries to the different Katta nodes instead of letting Solr to do this.

          I'll provide my changes soon.

          And btw, how you get these beautiful logs?

          I got the logs by configuring the log4j.properties like follows:

          katta.root.logger=TRACE,console
          log4j.logger.org.apache.zookeeper=WARN
          #log4j.logger.org.apache.hadoop=WARN
          log4j.logger.org.I0Itec.zkclient=WARN
          log4j.logger.org.mortbay.log=WARN
          katta.log.dir=./logs
          katta.log.file=katta.log
          
          # Define the root logger to the system property "katta.root.logger".
          log4j.rootLogger=${katta.root.logger}
          
          #
          # console
          # Add "console" to rootlogger above if you want to use this 
          #
          log4j.appender.console=org.apache.log4j.ConsoleAppender
          log4j.appender.console.target=System.out
          log4j.appender.console.layout=org.apache.log4j.PatternLayout
          log4j.appender.console.layout.ConversionPattern=%5p %d{ISO8601} [%t] %c - %m%n
          
          Show
          Mathias Walter added a comment - I also come into this problem, have you deal with it? Yeah, I got it working by using the Katta Distribution Policy instead of the MultiShard Distributed Search of Solr. That does not solve the RPC problem, but avoid it completely. Personally, I think it is better to keep Katta choosing how to distribute the queries to the different Katta nodes instead of letting Solr to do this. I'll provide my changes soon. And btw, how you get these beautiful logs? I got the logs by configuring the log4j.properties like follows: katta.root.logger=TRACE,console log4j.logger.org.apache.zookeeper=WARN #log4j.logger.org.apache.hadoop=WARN log4j.logger.org.I0Itec.zkclient=WARN log4j.logger.org.mortbay.log=WARN katta.log.dir=./logs katta.log.file=katta.log # Define the root logger to the system property "katta.root.logger". log4j.rootLogger=${katta.root.logger} # # console # Add "console" to rootlogger above if you want to use this # log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.out log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%5p %d{ISO8601} [%t] %c - %m%n
          Hide
          jianfeng zheng added a comment -

          Hey Mathias,

          I also come into this problem, have you deal with it? And btw, how you get these beautiful logs?

          Show
          jianfeng zheng added a comment - Hey Mathias, I also come into this problem, have you deal with it? And btw, how you get these beautiful logs?
          Hide
          Mathias Walter added a comment - - edited

          I ported the patch to Solr 3.1 and Katta 0.6.2, except the Katta test. I also fixed some bugs. The updated patch will be added soon.

          In the meanwhile I discovered a big issue. Often, a SolrKattaNode (back-end server) hosts many shards. If now a Solr front-end server starts a new query, it sent as many queries in parallel to the back-end servers as shards the have. In contrast, a Katta/Lucene search sends just one query to each back-end server which queries all shards a back-end server hosts.
          The problem is now that the Solr front-end server often did not receive all KattaResponse's from the back-end servers and hence timeout some queries and raises an exception. Sometimes a NullPointerException in org.apache.solr.handler.component.QueryComponent.mergeIds (usually at startup of the front-end server) and sometimes a NullPointerException in org.apache.solr.handler.component.QueryComponent.returnFields:

          TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Done waiting, results = ClientResult: 0 results, 0 errors, 0/1 shards (id=6:0)
          TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Shutting down work queue, results = ClientResult: 0 results, 0 errors, 0/1 shards (id=6:0)
          TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.ClientResult - close() called.
          TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.ClientResult - Notifying closed listener.
          TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Shut down via ClientRequest.close()
          TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Shutdown() called (id=6)
          TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Returning results = ClientResult: 0 results, 0 errors, 0/1 shards (closed), took 9989 ms (id=6:0)
          DEBUG 2010-08-18 10:32:25,730 [pool-3-thread-4] net.sf.katta.client.Client - broadcast(request([null, org.apache.solr.katta.KattaRequest@180a1d7b]),
           {ibis46.gsf.de:20001=[sen-00000#sen-00000]}) took 10001 msec for ClientResult: 0 results, 0 errors, 0/1 shards (closed)
          DEBUG 2010-08-18 10:32:25,730 [pool-3-thread-4] org.apache.solr.katta.KattaSearchHandler - KattaCommComponent shard: sen-00000 results.size: 0
           WARN 2010-08-18 10:32:25,730 [pool-3-thread-4] org.apache.solr.katta.KattaSearchHandler - Received 0 responses for query [], not 1
          ERROR 2010-08-18 10:32:25,731 [pool-1-thread-1] org.apache.solr.core.SolrCore - java.lang.NullPointerException
          	at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:411)
          	at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:308)
          	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:284)
          	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1322)
          	at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:52)
          	at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1144)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:619)
          
          
          DEBUG 2010-08-18 10:37:55,295 [pool-3-thread-9] net.sf.katta.client.Client - broadcast(request([null, org.apache.solr.katta.KattaRequest@71ce5e7a]),
           {ibis46.gsf.de:20001=[sen-00003#sen-00003]}) took 10001 msec for ClientResult: 0 results, 0 errors, 0/1 shards (closed)
          DEBUG 2010-08-18 10:37:55,295 [pool-3-thread-9] org.apache.solr.katta.KattaSearchHandler - KattaCommComponent shard: sen-00003 results.size: 0
           WARN 2010-08-18 10:37:55,295 [pool-3-thread-9] org.apache.solr.katta.KattaSearchHandler - Received 0 responses for query [], not 1
          ERROR 2010-08-18 10:37:55,296 [918077175@qtp-87740549-8] org.apache.solr.core.SolrCore - java.lang.NullPointerException
          	at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:574)
          	at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:312)
          	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:284)
          	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1322)
          	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
          	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
          	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
          	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
          	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
          	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
          	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:440)
          	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
          	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
          	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          	at org.mortbay.jetty.Server.handle(Server.java:326)
          	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
          	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)
          	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
          	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
          	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
          	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
          	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
          

          Interestingly, the back-end servers processing the queries immediately and send the results to the front-end server:

           INFO 2010-08-18 10:37:45,325 [pool-13-thread-9] org.apache.solr.core.SolrCore - [sen-00003#sen-00003] webapp=null path=/select params={start=0&
          ids=pubmed%3A1567687%3A1%3A0%2Cpubmed%3A17140099%3A8%3A0%2Cpubmed%3A12807258%3A6%3A0%2Cpubmed%3A11701068%3A3%3A0&
          ids=pubmed%3A1567687%3A1%3A0%2Cpubmed%3A17140099%3A8%3A0%2Cpubmed%3A12807258%3A6%3A0%2Cpubmed%3A11701068%3A3%3A0&q=Human&isShard=true&rows=10} status=0 QTime=7 
          DEBUG 2010-08-18 10:37:45,326 [IPC Server handler 17 on 20001] org.apache.solr.katta.SolrKattaServer - SolrServer.request: ibis46.gsf.de:20001 shards: [sen-00003#sen-00003]
           request params: start=0&ids=pubmed%3A1567687%3A1%3A0%2Cpubmed%3A17140099%3A8%3A0%2Cpubmed%3A12807258%3A6%3A0%2Cpubmed%3A11701068%3A3%3A0&q=Human&isShard=true&rows=10&shards=sen-00003%23sen-00003
           rsp: {response={numFound=4,start=0,docs=[SolrDocument[{id=pubmed:1567687:1:0, type=sentence, lang=en, pubdate=Fri Dec 15 11:39:20 CET 2006}],
           SolrDocument[{id=pubmed:17140099:8:0, type=sentence, lang=en, pubdate=Thu Mar 01 11:40:18 CET 2007}],
           SolrDocument[{id=pubmed:12807258:6:0, type=sentence, lang=en, pubdate=Thu Jun 11 11:37:14 CEST 2009}],
           SolrDocument[{id=pubmed:11701068:3:0, type=sentence, lang=en, pubdate=Fri Apr 28 11:36:26 CEST 2006}]]},
           QueriedShards=[Ljava.lang.String;@791ef9f6}
          DEBUG 2010-08-18 10:37:45,326 [IPC Server handler 17 on 20001] org.apache.hadoop.ipc.Server - Served: request queueTime= 8 procesingTime= 17
          DEBUG 2010-08-18 10:37:45,326 [IPC Server handler 17 on 20001] org.apache.hadoop.ipc.Server - IPC Server Responder: responding to #30 from 146.107.217.46:58679
          DEBUG 2010-08-18 10:37:45,326 [IPC Server handler 17 on 20001] org.apache.hadoop.ipc.Server - IPC Server Responder: responding to #30 from 146.107.217.46:58679 Wrote 386 bytes.
          

          But if the front-end server cancels the query in case of a timout, always the last sent KattaResponse was not recognized by the front-end server. I've attached a full communication log of one failed query for both the front-end (front-end.log) and the back-end ([^backend-end.log]) server.

          Did anyone run into the same issue? I hope because the error occurs quit often. I assume this bug is related to Hadoop RPC, but I could not find a Hadoop JIRA. I also tried the latest release candidate 0.21.0 of Hadoop.

          My idea is now to combine the parallel queries to one back-end server into one single query, similar to the Lucene queries implemented in Katta.

          Show
          Mathias Walter added a comment - - edited I ported the patch to Solr 3.1 and Katta 0.6.2, except the Katta test. I also fixed some bugs. The updated patch will be added soon. In the meanwhile I discovered a big issue. Often, a SolrKattaNode (back-end server) hosts many shards. If now a Solr front-end server starts a new query, it sent as many queries in parallel to the back-end servers as shards the have. In contrast, a Katta/Lucene search sends just one query to each back-end server which queries all shards a back-end server hosts. The problem is now that the Solr front-end server often did not receive all KattaResponse's from the back-end servers and hence timeout some queries and raises an exception. Sometimes a NullPointerException in org.apache.solr.handler.component.QueryComponent.mergeIds (usually at startup of the front-end server) and sometimes a NullPointerException in org.apache.solr.handler.component.QueryComponent.returnFields : TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Done waiting, results = ClientResult: 0 results, 0 errors, 0/1 shards (id=6:0) TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Shutting down work queue, results = ClientResult: 0 results, 0 errors, 0/1 shards (id=6:0) TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.ClientResult - close() called. TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.ClientResult - Notifying closed listener. TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Shut down via ClientRequest.close() TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Shutdown() called (id=6) TRACE 2010-08-18 10:32:25,729 [pool-3-thread-4] net.sf.katta.client.WorkQueue - Returning results = ClientResult: 0 results, 0 errors, 0/1 shards (closed), took 9989 ms (id=6:0) DEBUG 2010-08-18 10:32:25,730 [pool-3-thread-4] net.sf.katta.client.Client - broadcast(request([null, org.apache.solr.katta.KattaRequest@180a1d7b]), {ibis46.gsf.de:20001=[sen-00000#sen-00000]}) took 10001 msec for ClientResult: 0 results, 0 errors, 0/1 shards (closed) DEBUG 2010-08-18 10:32:25,730 [pool-3-thread-4] org.apache.solr.katta.KattaSearchHandler - KattaCommComponent shard: sen-00000 results.size: 0 WARN 2010-08-18 10:32:25,730 [pool-3-thread-4] org.apache.solr.katta.KattaSearchHandler - Received 0 responses for query [], not 1 ERROR 2010-08-18 10:32:25,731 [pool-1-thread-1] org.apache.solr.core.SolrCore - java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:411) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:308) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:284) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1322) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:52) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1144) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) DEBUG 2010-08-18 10:37:55,295 [pool-3-thread-9] net.sf.katta.client.Client - broadcast(request([null, org.apache.solr.katta.KattaRequest@71ce5e7a]), {ibis46.gsf.de:20001=[sen-00003#sen-00003]}) took 10001 msec for ClientResult: 0 results, 0 errors, 0/1 shards (closed) DEBUG 2010-08-18 10:37:55,295 [pool-3-thread-9] org.apache.solr.katta.KattaSearchHandler - KattaCommComponent shard: sen-00003 results.size: 0 WARN 2010-08-18 10:37:55,295 [pool-3-thread-9] org.apache.solr.katta.KattaSearchHandler - Received 0 responses for query [], not 1 ERROR 2010-08-18 10:37:55,296 [918077175@qtp-87740549-8] org.apache.solr.core.SolrCore - java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:574) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:312) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:284) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1322) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:440) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Interestingly, the back-end servers processing the queries immediately and send the results to the front-end server: INFO 2010-08-18 10:37:45,325 [pool-13-thread-9] org.apache.solr.core.SolrCore - [sen-00003#sen-00003] webapp=null path=/select params={start=0& ids=pubmed%3A1567687%3A1%3A0%2Cpubmed%3A17140099%3A8%3A0%2Cpubmed%3A12807258%3A6%3A0%2Cpubmed%3A11701068%3A3%3A0& ids=pubmed%3A1567687%3A1%3A0%2Cpubmed%3A17140099%3A8%3A0%2Cpubmed%3A12807258%3A6%3A0%2Cpubmed%3A11701068%3A3%3A0&q=Human&isShard=true&rows=10} status=0 QTime=7 DEBUG 2010-08-18 10:37:45,326 [IPC Server handler 17 on 20001] org.apache.solr.katta.SolrKattaServer - SolrServer.request: ibis46.gsf.de:20001 shards: [sen-00003#sen-00003] request params: start=0&ids=pubmed%3A1567687%3A1%3A0%2Cpubmed%3A17140099%3A8%3A0%2Cpubmed%3A12807258%3A6%3A0%2Cpubmed%3A11701068%3A3%3A0&q=Human&isShard=true&rows=10&shards=sen-00003%23sen-00003 rsp: {response={numFound=4,start=0,docs=[SolrDocument[{id=pubmed:1567687:1:0, type=sentence, lang=en, pubdate=Fri Dec 15 11:39:20 CET 2006}], SolrDocument[{id=pubmed:17140099:8:0, type=sentence, lang=en, pubdate=Thu Mar 01 11:40:18 CET 2007}], SolrDocument[{id=pubmed:12807258:6:0, type=sentence, lang=en, pubdate=Thu Jun 11 11:37:14 CEST 2009}], SolrDocument[{id=pubmed:11701068:3:0, type=sentence, lang=en, pubdate=Fri Apr 28 11:36:26 CEST 2006}]]}, QueriedShards=[Ljava.lang.String;@791ef9f6} DEBUG 2010-08-18 10:37:45,326 [IPC Server handler 17 on 20001] org.apache.hadoop.ipc.Server - Served: request queueTime= 8 procesingTime= 17 DEBUG 2010-08-18 10:37:45,326 [IPC Server handler 17 on 20001] org.apache.hadoop.ipc.Server - IPC Server Responder: responding to #30 from 146.107.217.46:58679 DEBUG 2010-08-18 10:37:45,326 [IPC Server handler 17 on 20001] org.apache.hadoop.ipc.Server - IPC Server Responder: responding to #30 from 146.107.217.46:58679 Wrote 386 bytes. But if the front-end server cancels the query in case of a timout, always the last sent KattaResponse was not recognized by the front-end server. I've attached a full communication log of one failed query for both the front-end ( front-end.log ) and the back-end ( [^backend-end.log] ) server. Did anyone run into the same issue? I hope because the error occurs quit often. I assume this bug is related to Hadoop RPC, but I could not find a Hadoop JIRA. I also tried the latest release candidate 0.21.0 of Hadoop. My idea is now to combine the parallel queries to one back-end server into one single query, similar to the Lucene queries implemented in Katta.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Sumit added a comment -

          1. Is this integration works with latest katta and solr source code? while following steps for katta no test-katta-core-0.6-dev.jar formed. Any idea why? Also 2 test cases got failed i,e; ShardManagerTest.java and LoadTestMasterOperationTest.java. So i removed these two test cases and proceeds.

          2. Now when i am trying to use ant test-core in solr after following all other steps i am getting compilation error. To me it seems errors are coming due to old lucene jars as like error is coming as "cannot find symbol symbol: variable TOKENIZED location: ckass org.apache.document.Field.Index". Some errors are coming in compiling KattaClientFailoverTest.java when it is trying to find AbstractKattaTest.

          Can some one help me in resolving these types of issues?

          Thanks,
          Sumit

          Show
          Sumit added a comment - 1. Is this integration works with latest katta and solr source code? while following steps for katta no test-katta-core-0.6-dev.jar formed. Any idea why? Also 2 test cases got failed i,e; ShardManagerTest.java and LoadTestMasterOperationTest.java. So i removed these two test cases and proceeds. 2. Now when i am trying to use ant test-core in solr after following all other steps i am getting compilation error. To me it seems errors are coming due to old lucene jars as like error is coming as "cannot find symbol symbol: variable TOKENIZED location: ckass org.apache.document.Field.Index". Some errors are coming in compiling KattaClientFailoverTest.java when it is trying to find AbstractKattaTest. Can some one help me in resolving these types of issues? Thanks, Sumit
          Hide
          Thomas Koch added a comment -

          This patch implements searching over a set of indices specified by a regular expression (in the shards= parameter of the query). For this patch to work, you also need to patch katta: http://oss.101tec.com/jira/browse/KATTA-91

          Show
          Thomas Koch added a comment - This patch implements searching over a set of indices specified by a regular expression (in the shards= parameter of the query). For this patch to work, you also need to patch katta: http://oss.101tec.com/jira/browse/KATTA-91
          Hide
          Thomas Koch added a comment -

          I've updated the patch for katta 0.6 however I deleted the SolrIndexer class since I don't need it and it relies on the indexer contribution to katta which seems to be deprecated.
          I still need to work on this patch, because I need the functionality to search all registered indexes. I'd appreciate any help!

          Show
          Thomas Koch added a comment - I've updated the patch for katta 0.6 however I deleted the SolrIndexer class since I don't need it and it relies on the indexer contribution to katta which seems to be deprecated. I still need to work on this patch, because I need the functionality to search all registered indexes. I'd appreciate any help!
          Hide
          Tatsuya Hayashi added a comment -

          I downloaded a Solr trunk and copied necessary jar files to solrtrunk/lib.
          And tried to apply the SOLR-1395.patch (patch -p 0 -i SOLR-1395.patch)
          But I saw a failure message on the console.

          patching file src/java/org/apache/solr/handler/component/SearchHandler.java
          Hunk #1 FAILED at 17.
          1 out of 4 hunks FAILED – saving rejects to file src/java/org/apache/solr/handler/component/SearchHandler.java.rej
          patching file src/solrj/org/apache/solr/client/solrj/request/QueryRequest.java

          Could anyone give me any suggestion to solve it?

          Show
          Tatsuya Hayashi added a comment - I downloaded a Solr trunk and copied necessary jar files to solrtrunk/lib. And tried to apply the SOLR-1395 .patch (patch -p 0 -i SOLR-1395 .patch) But I saw a failure message on the console. patching file src/java/org/apache/solr/handler/component/SearchHandler.java Hunk #1 FAILED at 17. 1 out of 4 hunks FAILED – saving rejects to file src/java/org/apache/solr/handler/component/SearchHandler.java.rej patching file src/solrj/org/apache/solr/client/solrj/request/QueryRequest.java Could anyone give me any suggestion to solve it?
          Hide
          Jason Venner (www.prohadoop.com) added a comment -

          I run this by having a set of front end solr instances with jetty (1 or more), then you can speak http to these solr instances, which will allow you to query via php.

          What I typically do, is just hack the solrconfig.xml in the solr/examples/solr/conf directory and drop in my schema.xml into the same directory.
          Then run java ..... -jar start.jar from the examples directory.

          Tweek this out for your production requirements ...

          Show
          Jason Venner (www.prohadoop.com) added a comment - I run this by having a set of front end solr instances with jetty (1 or more), then you can speak http to these solr instances, which will allow you to query via php. What I typically do, is just hack the solrconfig.xml in the solr/examples/solr/conf directory and drop in my schema.xml into the same directory. Then run java ..... -jar start.jar from the examples directory. Tweek this out for your production requirements ...
          Hide
          Thomas Koch added a comment -

          I'd also need katta integration at least for search, since my frontend is PHP (sorry...) and so I can't communicate as easily from PHP to Java as from PHP to SOLR.
          Has anybody already done an updated patch or could help me to do it?

          Show
          Thomas Koch added a comment - I'd also need katta integration at least for search, since my frontend is PHP (sorry...) and so I can't communicate as easily from PHP to Java as from PHP to SOLR. Has anybody already done an updated patch or could help me to do it?
          Hide
          Jason Rutherglen added a comment -

          shyjuThomas,

          It'd be good to update this patch to the latest Katta... You're welcome to do so... For my project I only need what'll be in SOLR-1724...

          Show
          Jason Rutherglen added a comment - shyjuThomas, It'd be good to update this patch to the latest Katta... You're welcome to do so... For my project I only need what'll be in SOLR-1724 ...
          Hide
          shyjuThomas added a comment -

          Now Katta 0.6 version has been released, and there are many changes present in the katta-core-0.6.0.jar compared to the katta-core-0.6-dev.jar present along with this. The patch provided for this issue will not work with this latest katta release version.

          Show
          shyjuThomas added a comment - Now Katta 0.6 version has been released, and there are many changes present in the katta-core-0.6.0.jar compared to the katta-core-0.6-dev.jar present along with this. The patch provided for this issue will not work with this latest katta release version.
          Hide
          Jason Rutherglen added a comment -

          Pravin,

          I'll review the test case when I can. Did you download and apply the latest patch?

          Show
          Jason Rutherglen added a comment - Pravin, I'll review the test case when I can. Did you download and apply the latest patch?
          Hide
          pravin karne added a comment -

          Hi,
          i have integrate above path successfully.But when i tried to run "ant test-core-Dtestcase=KattaClientTest" test ,its failed.
          Is there any katta configuration required?
          As this path uses katta internally How to deploy indexes on solr with above patch ?

          can i run katta and solr run on different machine ? how to configure this?

          can you please provide details configurations steps for katta/solr integrations

          Thanks

          Show
          pravin karne added a comment - Hi, i have integrate above path successfully.But when i tried to run "ant test-core-Dtestcase=KattaClientTest" test ,its failed. Is there any katta configuration required? As this path uses katta internally How to deploy indexes on solr with above patch ? can i run katta and solr run on different machine ? how to configure this? can you please provide details configurations steps for katta/solr integrations Thanks
          Hide
          Jason Venner (www.prohadoop.com) added a comment -

          AFIK this was committed also, so it is in katta trunk now.


          Jason Venner
          Author: Pro Hadoop A howto guide to learning and using hadoop and map/reduce
          http://www.prohadoopbook.com/ a Ning network for Hadoop using Professionals

          Stefan Groschupf
          <http://oss.101tec.com/jira/secure/ViewProfile.jspa?name=sg> added a
          comment - 13/Oct/09 09:15 PM

          Just committed that, thanks Jason.

          Show
          Jason Venner (www.prohadoop.com) added a comment - AFIK this was committed also, so it is in katta trunk now. – Jason Venner Author: Pro Hadoop A howto guide to learning and using hadoop and map/reduce http://www.prohadoopbook.com/ a Ning network for Hadoop using Professionals Stefan Groschupf < http://oss.101tec.com/jira/secure/ViewProfile.jspa?name=sg > added a comment - 13/Oct/09 09:15 PM Just committed that, thanks Jason.
          Hide
          Jason Venner (www.prohadoop.com) added a comment -

          My apologies, I think it is in the katta tree,
          Katta-80?
          http://oss.101tec.com/jira/browse/KATTA-80


          Jason Venner
          Author: Pro Hadoop A howto guide to learning and using hadoop and map/reduce
          http://www.prohadoopbook.com/ a Ning network for Hadoop using Professionals

          Show
          Jason Venner (www.prohadoop.com) added a comment - My apologies, I think it is in the katta tree, Katta-80? http://oss.101tec.com/jira/browse/KATTA-80 – Jason Venner Author: Pro Hadoop A howto guide to learning and using hadoop and map/reduce http://www.prohadoopbook.com/ a Ning network for Hadoop using Professionals
          Hide
          pravin karne added a comment -

          hi
          For solr patch i used following command:

          patch -p 0 -i solr-1395-1431-4.patch // this is for solrt trunk and its working

          but for katt trunk i have to use KATTA-SOLR.patch

          "KATTA-SOLR.patch" is not on jira .Shall i use same above patch i.e. solr-1395-1431-4.patch
          can plz tell me name of that patch file?

          Show
          pravin karne added a comment - hi For solr patch i used following command: patch -p 0 -i solr-1395-1431-4.patch // this is for solrt trunk and its working but for katt trunk i have to use KATTA-SOLR.patch "KATTA-SOLR.patch" is not on jira .Shall i use same above patch i.e. solr-1395-1431-4.patch can plz tell me name of that patch file?
          Hide
          Jason Venner (www.prohadoop.com) added a comment -

          It should all be attached to the jira.

          Show
          Jason Venner (www.prohadoop.com) added a comment - It should all be attached to the jira.
          Hide
          pravin karne added a comment -

          from where i can download KATTA-SOLR.patch

          Show
          pravin karne added a comment - from where i can download KATTA-SOLR.patch
          Hide
          Jason Venner (www.prohadoop.com) added a comment -

          solr-1395-1431-4.patch contains a number of repairs, and now facet count aggregation works.
          The one down side, is that this patch REQUIRES that the shards paramter explicitly list the shards to be queried, using a wild card does not work.

          I have this up and running nicely over 9 katta nodes and 65million documents.

          Show
          Jason Venner (www.prohadoop.com) added a comment - solr-1395-1431-4.patch contains a number of repairs, and now facet count aggregation works. The one down side, is that this patch REQUIRES that the shards paramter explicitly list the shards to be queried, using a wild card does not work. I have this up and running nicely over 9 katta nodes and 65million documents.
          Hide
          Jason Venner (www.prohadoop.com) added a comment -

          I was unable to separate them cleanly, so no.

          Show
          Jason Venner (www.prohadoop.com) added a comment - I was unable to separate them cleanly, so no.
          Hide
          Jason Rutherglen added a comment -

          Jason, Can you upload a SOLR-1395 only patch? That will help in seeing the SOLR-1395 specific changes.

          I think the next step is to remove the dependency on separate property files, as I find these hard to manage (they are too numerous).

          Show
          Jason Rutherglen added a comment - Jason, Can you upload a SOLR-1395 only patch? That will help in seeing the SOLR-1395 specific changes. I think the next step is to remove the dependency on separate property files, as I find these hard to manage (they are too numerous).
          Hide
          Jason Venner (www.prohadoop.com) added a comment - - edited

          /tmp/solr-1395-1431-3.patch contains an additional unit test for the query string serialization code, and two additional classes that allow for deployment to katta.

          WIth this jar, a katta client node may be started via
          katta-daemon.sh start katta\ startNode org.apache.solr.katta.DeployableSolrKattaServer
          The system properties that control the node startup are

          solr.server.name - the property to look for the server name, default proxy
          solr.home - the property to look for the server root, default solrHome
          solr.config.file - the property to look for the server config file name, default solr.xml

          These will be used to find a solr configuration to run the embedded server which will search the deployed shards.

          Index shards may be deployed via the standard katta mechanism of katta addIndex index-name shared-path-to-index
          I use the zip files produced by SOLR-1301 and deploy from hdfs.

          For searching, create a solr configuration with a handler:

          <requestHandler name="standard" class="solr.KattaRequestHandler" default="true">
          <!-- default values for query parameters -->
          <lst name="defaults">
          <str name="echoParams">explicit</str>
          <!--
          <int name="rows">10</int>
          <str name="fl">*</str>
          <str name="version">2.1</str>
          -->
          <str name="shards">*</str>
          </lst>
          </requestHandler>

          This will search all deployed shards, replace the shards parameter with an explicit shard list if you only wish to query an explicit subset with this query handler.

          The solr instance for search will need the zookeeper information.

          conf/katta.node.properties
          conf/katta.zk.properties, replace the zookeeper nodes with your clusters nodes

          I tend to run java -d64 -Xmx2g -Dkatta.request.timeout=100000 start.jar
          for my testing work as my cluster is on the far side of a couple of firewallss

          I also have to store my katta.zk.properties file in the start.jar for some reason

          Show
          Jason Venner (www.prohadoop.com) added a comment - - edited /tmp/solr-1395-1431-3.patch contains an additional unit test for the query string serialization code, and two additional classes that allow for deployment to katta. WIth this jar, a katta client node may be started via katta-daemon.sh start katta\ startNode org.apache.solr.katta.DeployableSolrKattaServer The system properties that control the node startup are solr.server.name - the property to look for the server name, default proxy solr.home - the property to look for the server root, default solrHome solr.config.file - the property to look for the server config file name, default solr.xml These will be used to find a solr configuration to run the embedded server which will search the deployed shards. Index shards may be deployed via the standard katta mechanism of katta addIndex index-name shared-path-to-index I use the zip files produced by SOLR-1301 and deploy from hdfs. For searching, create a solr configuration with a handler: <requestHandler name="standard" class="solr.KattaRequestHandler" default="true"> <!-- default values for query parameters --> <lst name="defaults"> <str name="echoParams">explicit</str> <!-- <int name="rows">10</int> <str name="fl">*</str> <str name="version">2.1</str> --> <str name="shards">*</str> </lst> </requestHandler> This will search all deployed shards, replace the shards parameter with an explicit shard list if you only wish to query an explicit subset with this query handler. The solr instance for search will need the zookeeper information. conf/katta.node.properties conf/katta.zk.properties, replace the zookeeper nodes with your clusters nodes I tend to run java -d64 -Xmx2g -Dkatta.request.timeout=100000 start.jar for my testing work as my cluster is on the far side of a couple of firewallss I also have to store my katta.zk.properties file in the start.jar for some reason
          Hide
          Jason Venner (www.prohadoop.com) added a comment - - edited

          the file /tmp/solr-1395-1431.patch is a combined patch of 1431 and 1395.
          A small api change in the query string creator required a small code change.
          ClientUtils.toQueryString, now prefixes the returned query string with a '?' character

          Show
          Jason Venner (www.prohadoop.com) added a comment - - edited the file /tmp/solr-1395-1431.patch is a combined patch of 1431 and 1395. A small api change in the query string creator required a small code change. ClientUtils.toQueryString, now prefixes the returned query string with a '?' character
          Hide
          Jason Rutherglen added a comment -

          Updated the KattaRequest class to properly serialize the SolrParams.

          Show
          Jason Rutherglen added a comment - Updated the KattaRequest class to properly serialize the SolrParams.
          Hide
          Jason Rutherglen added a comment -

          Copy these libraries into lib/ before executing the test. The Katta jars are somewhat custom. I'll post a patch there shortly.

          Show
          Jason Rutherglen added a comment - Copy these libraries into lib/ before executing the test. The Katta jars are somewhat custom. I'll post a patch there shortly.
          Hide
          Jason Rutherglen added a comment -

          New patch updated to Katta's latest from Git. It's slimmed down a bit, removing the various extraneous config files etc.

          Show
          Jason Rutherglen added a comment - New patch updated to Katta's latest from Git. It's slimmed down a bit, removing the various extraneous config files etc.
          Hide
          Jason Rutherglen added a comment -

          Noble, great idea! I opened an issue at SOLR-1431.

          Show
          Jason Rutherglen added a comment - Noble, great idea! I opened an issue at SOLR-1431 .
          Hide
          Noble Paul added a comment -

          Jason , why don't you separate issue for the CommComponent.It is useful for Solr even w/o Katta

          Show
          Noble Paul added a comment - Jason , why don't you separate issue for the CommComponent.It is useful for Solr even w/o Katta
          Hide
          Jason Venner (www.prohadoop.com) added a comment -

          Jason and I have a couple of small changes that make this simpler to use, and a first faq entry.
          If you get a NullPointerException in mergeId's a likely cause is a schema missmatch on the unique id field between an index served by a shard, and the top level solr instance performing the search.

          Show
          Jason Venner (www.prohadoop.com) added a comment - Jason and I have a couple of small changes that make this simpler to use, and a first faq entry. If you get a NullPointerException in mergeId's a likely cause is a schema missmatch on the unique id field between an index served by a shard, and the top level solr instance performing the search.
          Hide
          Jason Rutherglen added a comment -

          These are the external libraries necessary to run the test

          Show
          Jason Rutherglen added a comment - These are the external libraries necessary to run the test
          Hide
          Jason Rutherglen added a comment -
          Show
          Jason Rutherglen added a comment - I added a wiki page at: http://wiki.apache.org/solr/KattaIntegration
          Hide
          Stefan Groschupf added a comment -

          Jason please note that the latest katta code is actually in sourceforges git repo not in svn.

          Show
          Stefan Groschupf added a comment - Jason please note that the latest katta code is actually in sourceforges git repo not in svn.
          Hide
          Jason Rutherglen added a comment -

          This is our first cut at integrating Katta with Solr. The
          KattaClientTest test case shows a Katta cluster being created
          locally, a couple of cores/shards being placed into the cluster,
          then a query being executed that returns the correct number of
          results. It takes about 30s - 1.5min to run (hopefully that can
          be reduced?).

          Today Solr shards map to Solr servers. Here we map shards to
          cores, where there can be multiple shards per server or in Katta
          parlance a node. We assume the shards exist in Hadoop HDFS.
          Katta copies the shards to a local Solr server to make them
          searchable (and incrementally updateable).

          Instructions for Installation

          • Download a Solr trunk "svn co
            http://svn.apache.org/repos/asf/lucene/solr/trunk solrtrunk".
            Copy from kattatrunk: lib/log4j-1.2.13.jar
            lib/zookeeper-3.1.1.jar lib/hadoop-core-0.19.0.jar
            build/katta-core-0.6-dev.jar build/test-katta-core-0.6-dev.jar
            to solrtrunk/lib
          • Run a test while in solrtrunk "ant test-core
            -Dtestcase=KattaClientTest"

          General Notes

          • SearchHandler's HttpCommComponent has been abstracted out.
            There's a CommComponent interface, AbstractCommComponent
            implements the generic multithreading ShardRequest ->
            ShardResponse logic. EmbeddedSearchHandler executes requests on
            a set of local cores. HttpCommComponent implements requests over
            HTTP. KattaCommComponent distributes requests using Katta's
            Hadoop RPC mechanism.
          • The patch enables all of Solr's distributed request types. All
            current distributed requests should work as is with no
            modifications.
          • Shards/Solr cores may be managed dynamically and remotely
            administered from a centralized location (whereas today Solr
            typically requires SSHing and manually editing files etc)
          • Solr Katta has built in failover, this is tested in
            KattaClientFailoverTest
          • When a shard is deployed to a Solr server, the schema and
            solrconfig are deployed with it. This begs the question of how
            updates to the solrconfig and schema are deployed. Redeploying
            solrconfig is fairly simple, whereas a schema change implies
            recreating the entire shard.
          • Maybe there's an easy way to interface with Hadoop index
            creation (i.e. as easy as Solr's HTTP based update command)

          The patch was created by Jason Venner and Jason Rutherglen

          Show
          Jason Rutherglen added a comment - This is our first cut at integrating Katta with Solr. The KattaClientTest test case shows a Katta cluster being created locally, a couple of cores/shards being placed into the cluster, then a query being executed that returns the correct number of results. It takes about 30s - 1.5min to run (hopefully that can be reduced?). Today Solr shards map to Solr servers. Here we map shards to cores, where there can be multiple shards per server or in Katta parlance a node. We assume the shards exist in Hadoop HDFS. Katta copies the shards to a local Solr server to make them searchable (and incrementally updateable). Instructions for Installation Download Katta trunk "svn co https://katta.svn.sourceforge.net/svnroot/katta/trunk kattatrunk". Download the KATTA-SOLR.patch to kattatrunk. run "patch -p 0 -i KATTA-SOLR.patch", "ant -jar", "ant jar-test". Download a Solr trunk "svn co http://svn.apache.org/repos/asf/lucene/solr/trunk solrtrunk". Copy from kattatrunk: lib/log4j-1.2.13.jar lib/zookeeper-3.1.1.jar lib/hadoop-core-0.19.0.jar build/katta-core-0.6-dev.jar build/test-katta-core-0.6-dev.jar to solrtrunk/lib Download SOLR-1395 .patch to solrtrunk. Run "patch -p 0 -i SOLR-1395 .patch". Run a test while in solrtrunk "ant test-core -Dtestcase=KattaClientTest" General Notes SearchHandler's HttpCommComponent has been abstracted out. There's a CommComponent interface, AbstractCommComponent implements the generic multithreading ShardRequest -> ShardResponse logic. EmbeddedSearchHandler executes requests on a set of local cores. HttpCommComponent implements requests over HTTP. KattaCommComponent distributes requests using Katta's Hadoop RPC mechanism. The patch enables all of Solr's distributed request types. All current distributed requests should work as is with no modifications. Shards/Solr cores may be managed dynamically and remotely administered from a centralized location (whereas today Solr typically requires SSHing and manually editing files etc) Solr Katta has built in failover, this is tested in KattaClientFailoverTest When a shard is deployed to a Solr server, the schema and solrconfig are deployed with it. This begs the question of how updates to the solrconfig and schema are deployed. Redeploying solrconfig is fairly simple, whereas a schema change implies recreating the entire shard. Maybe there's an easy way to interface with Hadoop index creation (i.e. as easy as Solr's HTTP based update command) The patch was created by Jason Venner and Jason Rutherglen
          Hide
          Noble Paul added a comment -

          Why should this be a Solr issue? What is missing in Solr which prevents you from integrating katta into Solr as a some kind of plugin?

          Show
          Noble Paul added a comment - Why should this be a Solr issue? What is missing in Solr which prevents you from integrating katta into Solr as a some kind of plugin?

            People

            • Assignee:
              Unassigned
              Reporter:
              Jason Rutherglen
            • Votes:
              8 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development