Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      A special collection called .system needs to be created by the user to store/manage blobs. The schema/solrconfig of that collection need to be automatically supplied by the system so that there are no errors

      APIs need to be created to manage the content of that collection

      #create your .system collection first
      http://localhost:8983/solr/admin/collections?action=CREATE&name=.system&replicationFactor=2
      #The config for this collection is automatically created . numShards for this collection is hardcoded to 1
      
      #create a new jar or add a new version of a jar
      
      curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent
      
      #  GET on the end point would give a list of jars and other details
      curl http://localhost:8983/solr/.system/blob 
      # GET on the end point with jar name would give  details of various versions of the available jars
      curl http://localhost:8983/solr/.system/blob/mycomponent
      # GET on the end point with jar name and version with a wt=filestream to get the actual file
      curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream > mycomponent.1.jar
      
      # GET on the end point with jar name and wt=filestream to get the latest version of the file
      curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream > mycomponent.jar
      

      Please note that the jars are never deleted. a new version is added to the system everytime a new jar is posted for the name. You must use the standard delete commands to delete the old entries

      1. SOLR-6787.patch
        35 kB
        Noble Paul
      2. SOLR-6787.patch
        26 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          Noble Paul added a comment -

          Feature complete. But no testcases

          Show
          Noble Paul added a comment - Feature complete. But no testcases
          Hide
          Shawn Heisey added a comment -

          This sounds very cool.

          I do have a couple of questions after looking at the patch, probably showing a lot of ignorance:

          • Is this SolrCloud specific? If so, could it work on non-cloud setups? There seems to be ZK-specific code, and you mentioned collection, not core.
          • I see a MAX_SZ constant set to 2MB, although it doesn't appear to be used anywhere in the patch. One of the jars that I use with Solr is the icu4j jar, which is over 9 MB. Would the the max size be configurable?
          Show
          Shawn Heisey added a comment - This sounds very cool. I do have a couple of questions after looking at the patch, probably showing a lot of ignorance: Is this SolrCloud specific? If so, could it work on non-cloud setups? There seems to be ZK-specific code, and you mentioned collection, not core. I see a MAX_SZ constant set to 2MB, although it doesn't appear to be used anywhere in the patch. One of the jars that I use with Solr is the icu4j jar, which is over 9 MB. Would the the max size be configurable?
          Hide
          Noble Paul added a comment -

          Is this SolrCloud specific? If so, could it work on non-cloud setups?

          No. It works in non-cloud mode. But , we give emphasis to solrcloud on all documentation

          I see a MAX_SZ constant set to 2MB, although it doesn't appear to be used anywhere in the patch...

          It is used and that is the limit

          One of the jars that I use with Solr is the icu4j jar which is over 9 MB.

          I want to keep a sensible max size so that people don't screw their system. BTW, I would make it configurable for the handler using the config API

          Show
          Noble Paul added a comment - Is this SolrCloud specific? If so, could it work on non-cloud setups? No. It works in non-cloud mode. But , we give emphasis to solrcloud on all documentation I see a MAX_SZ constant set to 2MB, although it doesn't appear to be used anywhere in the patch... It is used and that is the limit One of the jars that I use with Solr is the icu4j jar which is over 9 MB. I want to keep a sensible max size so that people don't screw their system. BTW, I would make it configurable for the handler using the config API
          Hide
          Shalin Shekhar Mangar added a comment -

          A special collection called .system needs to be created by the user to store/manage blobs.

          That 'dot' might be easy to miss and then people will get weird 404 errors. Perhaps we should keep system just like version or route or even route?

          Show
          Shalin Shekhar Mangar added a comment - A special collection called .system needs to be created by the user to store/manage blobs. That 'dot' might be easy to miss and then people will get weird 404 errors. Perhaps we should keep system just like version or route or even route ?
          Hide
          Noble Paul added a comment -

          The period looked like a logical name for users failiar with *nix where , hidden/syste directories are named so. Anyway I'm open to suggestions

          Show
          Noble Paul added a comment - The period looked like a logical name for users failiar with *nix where , hidden/syste directories are named so. Anyway I'm open to suggestions
          Hide
          Noble Paul added a comment -

          final patch with testcases.
          The collection name is still open for change

          Show
          Noble Paul added a comment - final patch with testcases. The collection name is still open for change
          Hide
          ASF subversion and git services added a comment -

          Commit 1644095 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1644095 ]

          SOLR-6787 API to manage blobs in Solr

          Show
          ASF subversion and git services added a comment - Commit 1644095 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1644095 ] SOLR-6787 API to manage blobs in Solr
          Hide
          ASF subversion and git services added a comment -

          Commit 1644341 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1644341 ]

          SOLR-6787 adding CHANGES.TXT entry

          Show
          ASF subversion and git services added a comment - Commit 1644341 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1644341 ] SOLR-6787 adding CHANGES.TXT entry
          Hide
          ASF subversion and git services added a comment -

          Commit 1644376 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1644376 ]

          SOLR-6787 API to manage blobs in Solr

          Show
          ASF subversion and git services added a comment - Commit 1644376 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1644376 ] SOLR-6787 API to manage blobs in Solr
          Hide
          Shalin Shekhar Mangar added a comment -

          The period looked like a logical name for users failiar with *nix where , hidden/syste directories are named so. Anyway I'm open to suggestions

          Yeah, on further thinking, I've warmed up to .system. Looking forward to what comes next!

          Show
          Shalin Shekhar Mangar added a comment - The period looked like a logical name for users failiar with *nix where , hidden/syste directories are named so. Anyway I'm open to suggestions Yeah, on further thinking, I've warmed up to .system. Looking forward to what comes next!
          Hide
          ASF subversion and git services added a comment -

          Commit 1644489 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1644489 ]

          SOLR-6787 removing the dependency on zk jar for post.jar

          Show
          ASF subversion and git services added a comment - Commit 1644489 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1644489 ] SOLR-6787 removing the dependency on zk jar for post.jar
          Hide
          ASF subversion and git services added a comment -

          Commit 1644494 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1644494 ]

          SOLR-6787 removing the dependency on zk jar for post.jar

          Show
          ASF subversion and git services added a comment - Commit 1644494 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1644494 ] SOLR-6787 removing the dependency on zk jar for post.jar
          Hide
          Yonik Seeley added a comment -

          I'm not sure if there was a previous discussion that covers some of my questions here... so I'll just shoot:

          I think I understand some of the motivations around a special collection... but I don't understand why there needs to be special APIs.
          We should be able to support binary field types, and if our support is lacking, we should fix that. Same with more REST-like APIs to query the collection or retrieve data, and same for document versioning.

          As far as the motivations for this issue... I see SOLR-6801 being linked. But can a request handler / component in solrconfig.xml make use of a jar in .system? Does this mean that .system will somehow need to come up before every other collection in a cloud setup (or even in a stand-alone setup). Or is persistence being handled some other way?

          In some ways it feels like we're starting from the bottom up (which can be a fine approach) without the use-cases / high level designs / goals being documented or hashed out (unless I've missed some of these discussions... pointers welcome). Does this stuff relate at all to the goal of providing a smaller download and having an easier plugin mechanism for the stuff that's in contrib?

          Show
          Yonik Seeley added a comment - I'm not sure if there was a previous discussion that covers some of my questions here... so I'll just shoot: I think I understand some of the motivations around a special collection... but I don't understand why there needs to be special APIs. We should be able to support binary field types, and if our support is lacking, we should fix that. Same with more REST-like APIs to query the collection or retrieve data, and same for document versioning. As far as the motivations for this issue... I see SOLR-6801 being linked. But can a request handler / component in solrconfig.xml make use of a jar in .system? Does this mean that .system will somehow need to come up before every other collection in a cloud setup (or even in a stand-alone setup). Or is persistence being handled some other way? In some ways it feels like we're starting from the bottom up (which can be a fine approach) without the use-cases / high level designs / goals being documented or hashed out (unless I've missed some of these discussions... pointers welcome). Does this stuff relate at all to the goal of providing a smaller download and having an easier plugin mechanism for the stuff that's in contrib?
          Hide
          Noble Paul added a comment - - edited

          but I don't understand why there needs to be special APIs.

          These are not really "special" APIs. It is just another requesthandler which anyone can register in any core.

          We should be able to support binary field types, and if our support is lacking

          yes. But it also needs to do more things to make this usable. This requesthandler really makes that very convenient. I don't see how can a generic binary field API be as useful. Suggestions are welcome

          But can a request handler / component in solrconfig.xml make use of a jar in .system? Does this mean that .system will somehow need to come up before every other collection in a cloud setup

          All the handlers loaded from .system will be automatically be startup="lazy . So , any request fired to those handlers must respond with " .system collection not yet available " , till .system is loaded

          Does this stuff relate at all to the goal of providing a smaller download and having an easier plugin mechanism for the stuff that's in contrib

          No , this is not conceived for a smaller download. I don't yet plan to make the contribs plugged in through this

          The usecase is this.

          I have a fairly large solrcloud cluster where I deploy a custom component. The current solution is to go to all the nodes and put a jar file there and do a rolling restart of the entire cluster. And, for every new version of the component , the user has to go through the same steps. For Lucidworks , it is a fairly common usecase and will make our product easier to manage

          The other usecase is to manage other files like synonyms / stopwords (or any other files required by any other component) in Solr so that we don't load very large files into Zookeeper

          In some ways it feels like we're starting from the bottom up (which can be a fine approach) without the use-cases / high level designs / goals

          We are rethinking the way Solr is being used. The objective is to make it less painful to do what we experts can do with Solr. I'm glad that people are asking . NO , you haven't missed anything . This JIRA is the first piece of documentation ever to happen on this topic and all questions are welcome .
          Let's build it together

          Show
          Noble Paul added a comment - - edited but I don't understand why there needs to be special APIs. These are not really "special" APIs. It is just another requesthandler which anyone can register in any core. We should be able to support binary field types, and if our support is lacking yes. But it also needs to do more things to make this usable. This requesthandler really makes that very convenient. I don't see how can a generic binary field API be as useful. Suggestions are welcome But can a request handler / component in solrconfig.xml make use of a jar in .system? Does this mean that .system will somehow need to come up before every other collection in a cloud setup All the handlers loaded from .system will be automatically be startup="lazy . So , any request fired to those handlers must respond with " .system collection not yet available " , till .system is loaded Does this stuff relate at all to the goal of providing a smaller download and having an easier plugin mechanism for the stuff that's in contrib No , this is not conceived for a smaller download. I don't yet plan to make the contribs plugged in through this The usecase is this. I have a fairly large solrcloud cluster where I deploy a custom component. The current solution is to go to all the nodes and put a jar file there and do a rolling restart of the entire cluster. And, for every new version of the component , the user has to go through the same steps. For Lucidworks , it is a fairly common usecase and will make our product easier to manage The other usecase is to manage other files like synonyms / stopwords (or any other files required by any other component) in Solr so that we don't load very large files into Zookeeper In some ways it feels like we're starting from the bottom up (which can be a fine approach) without the use-cases / high level designs / goals We are rethinking the way Solr is being used. The objective is to make it less painful to do what we experts can do with Solr. I'm glad that people are asking . NO , you haven't missed anything . This JIRA is the first piece of documentation ever to happen on this topic and all questions are welcome . Let's build it together
          Hide
          Yonik Seeley added a comment -

          These are not really "special" APIs.

          I was responding to this: "APIs need to be created to manage the content of that collection"
          And I was wondering since binary field and "blob" seem synonymous, why there would be a separate/different API to get/set the value of such a field.

          All the handlers loaded from .system will be automatically be startup="lazy" .

          But request handlers are one of the only things that have support for "lazy".
          What's the plan to support custom SearchComponents, Update processors, QParsers, or ValueSourceParsers (all of those are very common)?

          Also, a big question is persistence. What happens when you add a request handler via API, and then the server is bounced?

          We are rethinking the way Solr is being used.

          That's great, but please do so in public forums so everyone can participate in the discussion.

          Show
          Yonik Seeley added a comment - These are not really "special" APIs. I was responding to this: "APIs need to be created to manage the content of that collection" And I was wondering since binary field and "blob" seem synonymous, why there would be a separate/different API to get/set the value of such a field. All the handlers loaded from .system will be automatically be startup="lazy" . But request handlers are one of the only things that have support for "lazy". What's the plan to support custom SearchComponents, Update processors, QParsers, or ValueSourceParsers (all of those are very common)? Also, a big question is persistence. What happens when you add a request handler via API, and then the server is bounced? We are rethinking the way Solr is being used. That's great, but please do so in public forums so everyone can participate in the discussion.
          Hide
          Noble Paul added a comment -

          why there would be a separate/different API to get/set the value of such a field.

          Tell me how can I do that today . I may need to use SolrJ or something. If I let users do it themselves , they will use their own schema and their own tools to achieve the same. Eventually we will have a system which "anyone can use but no one would want to use"

          And what about versioning? I don't want different nodes to run different versions of the jar

          We should stop building "ready to cook" stuff and start making "ready to eat" stuff if we want to survive in this space.

          So. I am starting with the user and ask the question , "how would they use it". Then I think of how we can achieve it in Solr (and not the other way around) If I can't put a simple coherent way of using something I don't wan't to recommend anyone to use it.

          But request handlers are one of the only things that have support for "lazy".

          RequestHandlers are being added first. They are the first class citizens. Others will have to be loaded lazily as well. Will be taken up in due course

          When I said "lazy" it does not mean the current implementation of "lazy" . It is a new implementation of laziness required for dynamic loading

          Also, a big question is persistence. What happens when you add a request handler via API, and then the server is bounced?

          They are all persisted . Actually any new component added causes a core reload automatically . refer SOLR-6607

          That's great, but please do so in public forums so everyone can participate in the discussion.

          I'm going solo. So , there are not many discussions. I would like others to start discussions in JIRA and I can participate

          I used "we" because it is a goal for Lucidworks to make this possible. "How" is not discussed

          Show
          Noble Paul added a comment - why there would be a separate/different API to get/set the value of such a field. Tell me how can I do that today . I may need to use SolrJ or something. If I let users do it themselves , they will use their own schema and their own tools to achieve the same. Eventually we will have a system which "anyone can use but no one would want to use" And what about versioning? I don't want different nodes to run different versions of the jar We should stop building "ready to cook" stuff and start making "ready to eat" stuff if we want to survive in this space. So. I am starting with the user and ask the question , "how would they use it". Then I think of how we can achieve it in Solr (and not the other way around) If I can't put a simple coherent way of using something I don't wan't to recommend anyone to use it. But request handlers are one of the only things that have support for "lazy". RequestHandlers are being added first. They are the first class citizens. Others will have to be loaded lazily as well. Will be taken up in due course When I said "lazy" it does not mean the current implementation of "lazy" . It is a new implementation of laziness required for dynamic loading Also, a big question is persistence. What happens when you add a request handler via API, and then the server is bounced? They are all persisted . Actually any new component added causes a core reload automatically . refer SOLR-6607 That's great, but please do so in public forums so everyone can participate in the discussion. I'm going solo. So , there are not many discussions. I would like others to start discussions in JIRA and I can participate I used "we" because it is a goal for Lucidworks to make this possible. "How" is not discussed
          Hide
          ASF subversion and git services added a comment -

          Commit 1646418 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1646418 ]

          SOLR-6787 refactored a method out

          Show
          ASF subversion and git services added a comment - Commit 1646418 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1646418 ] SOLR-6787 refactored a method out
          Hide
          ASF subversion and git services added a comment -

          Commit 1646419 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1646419 ]

          SOLR-6787 refactored a method out

          Show
          ASF subversion and git services added a comment - Commit 1646419 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1646419 ] SOLR-6787 refactored a method out
          Hide
          Alexandre Rafalovitch added a comment -

          Why do you need a user to create the collection first? Especially since the config/etc is hardcoded and it needs to be present on every node? Wouldn't it make sense for collection to be just auto-created on first access? Or is there a specific flexibility that only user can decide on?

          Also, if you have heterogeneous environment, the current implementation would require a union of all blobs, right? As in, every node will need to have all possible blobs/jars, not just the jars the collections on that node have.

          Finally, does the .system collection show up on the Admin UI? Are there implications/restrictions of that?

          Show
          Alexandre Rafalovitch added a comment - Why do you need a user to create the collection first? Especially since the config/etc is hardcoded and it needs to be present on every node? Wouldn't it make sense for collection to be just auto-created on first access? Or is there a specific flexibility that only user can decide on? Also, if you have heterogeneous environment, the current implementation would require a union of all blobs, right? As in, every node will need to have all possible blobs/jars, not just the jars the collections on that node have. Finally, does the .system collection show up on the Admin UI? Are there implications/restrictions of that?
          Hide
          Yonik Seeley added a comment -

          > why there would be a separate/different API to get/set the value of such a field.
          Tell me how can I do that today.

          That's my entire point... if it's hard to do X, look first into making X easier rather than creating a different method.
          At the end of the day, there may still be some little things that warrant separate API calls to make it easier. But if anything is identified to make generic APIs more usable, that's a win.

          Another way to look at it: It appears you've created an API specific to ".system"... but why should it be specific? If it's easier to manage blobs this way, shouldn't users be able to use this on whatever collections / fields they want?

          Show
          Yonik Seeley added a comment - > why there would be a separate/different API to get/set the value of such a field. Tell me how can I do that today. That's my entire point... if it's hard to do X, look first into making X easier rather than creating a different method. At the end of the day, there may still be some little things that warrant separate API calls to make it easier. But if anything is identified to make generic APIs more usable, that's a win. Another way to look at it: It appears you've created an API specific to ".system"... but why should it be specific? If it's easier to manage blobs this way, shouldn't users be able to use this on whatever collections / fields they want?
          Hide
          Yonik Seeley added a comment -

          We should stop building "ready to cook" stuff and start making "ready to eat" stuff if we want to survive in this space.
          So. I am starting with the user and ask the question , "how would they use it".

          I've always had that philosophy and I think the majority of other committers do also.
          Trying to reuse and improve existing APIs before creating new ones, or making new APIs as generic as possible does not run counter to that (and produces a better system over the long haul).

          I'm going solo. So , there are not many discussions. I would like others to start discussions in JIRA and I can participate

          Another option would have been to do it in a branch first, esp if it's exploratory.

          Show
          Yonik Seeley added a comment - We should stop building "ready to cook" stuff and start making "ready to eat" stuff if we want to survive in this space. So. I am starting with the user and ask the question , "how would they use it". I've always had that philosophy and I think the majority of other committers do also. Trying to reuse and improve existing APIs before creating new ones, or making new APIs as generic as possible does not run counter to that (and produces a better system over the long haul). I'm going solo. So , there are not many discussions. I would like others to start discussions in JIRA and I can participate Another option would have been to do it in a branch first, esp if it's exploratory.
          Hide
          Noble Paul added a comment - - edited

          That's my entire point... if it's hard to do X, look first into making X easier

          We should do that too.

          But it does not solve the problem

          If it's easier to manage blobs this way

          It won't be. Why would anyone want to store a blob and all the past versions of it normally?

          I'm trying to dedupe a blob here , is it required normally ?

          I'm trying to ensure that there is always a name for a blob , is it required normally ?

          Another option would have been to do it in a branch first, esp if it's exploratory.

          It is not exploratory . This is a feature we need

          Trying to reuse and improve existing APIs before creating new ones, or making new APIs as generic as possible does not run counter to that

          As I said earlier I'm fully convinced about the need of this API. At the same time I'm open to suggestions to improve it.
          We do an exploratory branch when we have a doubt on how to implement it. In this case , the question is about the interface which requires discussion not another branch

          I guess you are confused by the title of the ticket. It is not an API to jus manage "binary" fields . I need certain semantics for storing large objects in a special collection. There are more use cases , such as large synonyms, stopwords etc and other storage for the system where ZK is not suitable

          Show
          Noble Paul added a comment - - edited That's my entire point... if it's hard to do X, look first into making X easier We should do that too. But it does not solve the problem If it's easier to manage blobs this way It won't be. Why would anyone want to store a blob and all the past versions of it normally? I'm trying to dedupe a blob here , is it required normally ? I'm trying to ensure that there is always a name for a blob , is it required normally ? Another option would have been to do it in a branch first, esp if it's exploratory. It is not exploratory . This is a feature we need Trying to reuse and improve existing APIs before creating new ones, or making new APIs as generic as possible does not run counter to that As I said earlier I'm fully convinced about the need of this API. At the same time I'm open to suggestions to improve it. We do an exploratory branch when we have a doubt on how to implement it. In this case , the question is about the interface which requires discussion not another branch I guess you are confused by the title of the ticket. It is not an API to jus manage "binary" fields . I need certain semantics for storing large objects in a special collection. There are more use cases , such as large synonyms, stopwords etc and other storage for the system where ZK is not suitable
          Hide
          Noble Paul added a comment -

          hi Alexandre Rafalovitch I postyed a reply to you , but it got lost

          Why do you need a user to create the collection first? Especially since the config/etc is hardcoded and it needs to be present on every node?

          Most users will not use this feature . When it becomes common place we should create it y default.

          Also, if you have heterogeneous environment, the current implementation would require a union of all blobs, right? As in, every node will need to have all possible blobs/jars, not just the jars the collections on that node have.

          That is not true. The jars are fetched over the http API and used on demand . It is never fetched from a local collection

          Finally, does the .system collection show up on the Admin UI? Are there implications/restrictions of that?

          It is a normal collection where we have a predefined schema/config and it shows up in the admin UI

          Show
          Noble Paul added a comment - hi Alexandre Rafalovitch I postyed a reply to you , but it got lost Why do you need a user to create the collection first? Especially since the config/etc is hardcoded and it needs to be present on every node? Most users will not use this feature . When it becomes common place we should create it y default. Also, if you have heterogeneous environment, the current implementation would require a union of all blobs, right? As in, every node will need to have all possible blobs/jars, not just the jars the collections on that node have. That is not true. The jars are fetched over the http API and used on demand . It is never fetched from a local collection Finally, does the .system collection show up on the Admin UI? Are there implications/restrictions of that? It is a normal collection where we have a predefined schema/config and it shows up in the admin UI
          Hide
          Yonik Seeley added a comment -

          > That's my entire point... if it's hard to do X, look first into making X easier
          We should do that too.
          But it does not solve the problem

          It doesn't have to solve the whole top level use case you're thinking of... that's not what I'm talking about at all.
          I'm talking about API design in general. If X is hard, look to make it easier in a way that benefits more than just the specific "problem" you're trying to solve.

          > If it's easier to manage blobs this way
          It won't be. Why would anyone want to store a blob and all the past versions of it normally?

          People have asked to store blobs (binary fields), and people have asked for versioning.

          I'm trying to dedupe a blob here , is it required normally ?

          Yes, people have asked for deduping (that's why we have an update processor for it now)

          I'm trying to ensure that there is always a name for a blob , is it required normally ?

          Yes, this is why required fields were added to the schema.

          This is a feature we need

          The need for the functionalltiy of a feature says nothing about what form the API/implementation should take.
          Anyway, it feels like we're completely talking past each-other or something at this point... it seemed pretty obvious to me that "blob" == "binary field".

          Show
          Yonik Seeley added a comment - > That's my entire point... if it's hard to do X, look first into making X easier We should do that too. But it does not solve the problem It doesn't have to solve the whole top level use case you're thinking of... that's not what I'm talking about at all. I'm talking about API design in general. If X is hard, look to make it easier in a way that benefits more than just the specific "problem" you're trying to solve. > If it's easier to manage blobs this way It won't be. Why would anyone want to store a blob and all the past versions of it normally? People have asked to store blobs (binary fields), and people have asked for versioning. I'm trying to dedupe a blob here , is it required normally ? Yes, people have asked for deduping (that's why we have an update processor for it now) I'm trying to ensure that there is always a name for a blob , is it required normally ? Yes, this is why required fields were added to the schema. This is a feature we need The need for the functionalltiy of a feature says nothing about what form the API/implementation should take. Anyway, it feels like we're completely talking past each-other or something at this point... it seemed pretty obvious to me that "blob" == "binary field".
          Hide
          Noble Paul added a comment -

          If these were the requirements , I fail to see them captured in a ticket.

          But , it is still possible to offload most of the code in this handler to a generic API without impacting this set of APIs.

          The questions I would like answered are

          1. is it required to change the public interfaces of this API? If yes, please elaborate
          2. We may need a better binary field API. I totally agree. I would like to see the specific requirements . I'll be glad to make this generic
          Show
          Noble Paul added a comment - If these were the requirements , I fail to see them captured in a ticket. But , it is still possible to offload most of the code in this handler to a generic API without impacting this set of APIs. The questions I would like answered are is it required to change the public interfaces of this API? If yes, please elaborate We may need a better binary field API. I totally agree. I would like to see the specific requirements . I'll be glad to make this generic
          Hide
          Alexandre Rafalovitch added a comment -

          Looking at the patch, it seems an improvement of couple of universal pieces of code plus hard-coding a specific collection setup and URL structure. This is a very basic view, as I am not totally familiar with the codebase, but perhaps I can help to find the middle ground.

          The universal changes seem to be about allow post tool and other code to better handle binary content. So, that's already generic.

          The hardcoded schema/solrconfig look to me like they should be something like a special-case configset that's perhaps stored in the distribution jar. Which would be separate new, but I am sure useful, functionality. The hardcoded shard requirements look to me like something that would be a documentation issue, especially since we require the user to create the collection anyway. Or maybe it should be a variable somewhere that can be carried along with the configset (core.properties?)

          The other part is about loading and serving a binary blob. We already have BinaryField and in fact that's what's used under the covers in the hard-coded schema. How do we populate/extract things from it now? This implementation looks useful, it could just have an assertion/requirement that whatever the query string is, the result should be a single item, single field which would then be binary streamed to the user.

          There was some de-duplication code. Could it not be an UpdateRequestProcessor of some sort?

          Finally, there is the convenience API that hardcodes blob name and version as URL parameters. This may be hard to make generic but is it really that big a deal given that this particular use-case is internal (right?). The internal code could generate query parameters just as easily as the REST style.

          Show
          Alexandre Rafalovitch added a comment - Looking at the patch, it seems an improvement of couple of universal pieces of code plus hard-coding a specific collection setup and URL structure. This is a very basic view, as I am not totally familiar with the codebase, but perhaps I can help to find the middle ground. The universal changes seem to be about allow post tool and other code to better handle binary content. So, that's already generic. The hardcoded schema/solrconfig look to me like they should be something like a special-case configset that's perhaps stored in the distribution jar. Which would be separate new, but I am sure useful, functionality. The hardcoded shard requirements look to me like something that would be a documentation issue, especially since we require the user to create the collection anyway. Or maybe it should be a variable somewhere that can be carried along with the configset (core.properties?) The other part is about loading and serving a binary blob. We already have BinaryField and in fact that's what's used under the covers in the hard-coded schema. How do we populate/extract things from it now? This implementation looks useful, it could just have an assertion/requirement that whatever the query string is, the result should be a single item, single field which would then be binary streamed to the user. There was some de-duplication code. Could it not be an UpdateRequestProcessor of some sort? Finally, there is the convenience API that hardcodes blob name and version as URL parameters. This may be hard to make generic but is it really that big a deal given that this particular use-case is internal (right?). The internal code could generate query parameters just as easily as the REST style.
          Hide
          Mark Miller added a comment -

          I'm going solo.

          Before going solo, I wish you would learn how the project formats code, tags commits, and focus a little more on getting your test solid before committing. These mistakes are all too common.

          Show
          Mark Miller added a comment - I'm going solo. Before going solo, I wish you would learn how the project formats code, tags commits, and focus a little more on getting your test solid before committing. These mistakes are all too common.
          Hide
          Noble Paul added a comment - - edited

          point taken Mark Miller

          And all inputs on this API design are welcome

          Show
          Noble Paul added a comment - - edited point taken Mark Miller And all inputs on this API design are welcome
          Hide
          Yonik Seeley added a comment -

          If these were the requirements , I fail to see them captured in a ticket.

          I think you're maybe confusing open source development with commercial software development
          We don't really work off of "requirements", and we certainly strive to do better than just meet requirements.
          We need to thinking about the potential requirements/desires of the thousands of users (and potential new users) of Solr.
          We need to think hard about APIs since they tend to stick around a long time.

          I was essentially asking "can we do better?". It doesn't mean the answer is "yes"... but you should have hopefully thought about it.

          Here's some stupid simple hypothetical examples of what I'm talking about:

          Hypothetical: I need to add three numbers together in a function query... I'm going to add a "add3(x,y,z)". Can we do better? yeah... maybe we should just make the existing "add()" take any number of arguments. There was no "requirement" to do more than 3... but that doesn't matter in the slightest.

          Hypothetical: My common function add(mul(a,b),mul(c,d)) was too slow for my requirements, so I added a custom add_prods(a,b,c,d) that gets rid of the overhead and is now faster. Can we do better? Yes... let's not change the API for this special case and instead look for these patterns and automatically optimize them.

          Show
          Yonik Seeley added a comment - If these were the requirements , I fail to see them captured in a ticket. I think you're maybe confusing open source development with commercial software development We don't really work off of "requirements", and we certainly strive to do better than just meet requirements. We need to thinking about the potential requirements/desires of the thousands of users (and potential new users) of Solr. We need to think hard about APIs since they tend to stick around a long time. I was essentially asking "can we do better?". It doesn't mean the answer is "yes"... but you should have hopefully thought about it. Here's some stupid simple hypothetical examples of what I'm talking about: Hypothetical: I need to add three numbers together in a function query... I'm going to add a "add3(x,y,z)". Can we do better? yeah... maybe we should just make the existing "add()" take any number of arguments. There was no "requirement" to do more than 3... but that doesn't matter in the slightest. Hypothetical: My common function add(mul(a,b),mul(c,d)) was too slow for my requirements, so I added a custom add_prods(a,b,c,d) that gets rid of the overhead and is now faster. Can we do better? Yes... let's not change the API for this special case and instead look for these patterns and automatically optimize them.
          Hide
          Noble Paul added a comment -

          I was essentially asking "can we do better?". It doesn't mean the answer is "yes"... but you should have hopefully thought about it.

          I totally understand where you are coming from. I agree with you on most of it in principle.

          My point was this

          • I would like to know if the public API of this ticket can be improved
          • Can we totally do away with this and make it some kind of generic feature. honestly , I don't see that as a good option.

          I'm very much in agreement with you on the point that we need to get some of it to a generic API.

          Show
          Noble Paul added a comment - I was essentially asking "can we do better?". It doesn't mean the answer is "yes"... but you should have hopefully thought about it. I totally understand where you are coming from. I agree with you on most of it in principle. My point was this I would like to know if the public API of this ticket can be improved Can we totally do away with this and make it some kind of generic feature. honestly , I don't see that as a good option. I'm very much in agreement with you on the point that we need to get some of it to a generic API.
          Hide
          ASF subversion and git services added a comment -

          Commit 1648836 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1648836 ]

          SOLR-6787 changed the schema of blob store. now, the id is blobName/version and not the md5

          Show
          ASF subversion and git services added a comment - Commit 1648836 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1648836 ] SOLR-6787 changed the schema of blob store. now, the id is blobName/version and not the md5
          Hide
          ASF subversion and git services added a comment -

          Commit 1648848 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1648848 ]

          SOLR-6787 changed the schema of blob store. now, the id is blobName/version and not the md5

          Show
          ASF subversion and git services added a comment - Commit 1648848 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1648848 ] SOLR-6787 changed the schema of blob store. now, the id is blobName/version and not the md5
          Hide
          ASF subversion and git services added a comment -

          Commit 1649349 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1649349 ]

          SOLR-6787 more logging to trace errors

          Show
          ASF subversion and git services added a comment - Commit 1649349 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1649349 ] SOLR-6787 more logging to trace errors
          Hide
          ASF subversion and git services added a comment -

          Commit 1650030 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1650030 ]

          SOLR-6787 hardening tests

          Show
          ASF subversion and git services added a comment - Commit 1650030 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1650030 ] SOLR-6787 hardening tests
          Hide
          ASF subversion and git services added a comment -

          Commit 1650032 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1650032 ]

          SOLR-6787 hardening tests

          Show
          ASF subversion and git services added a comment - Commit 1650032 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650032 ] SOLR-6787 hardening tests
          Hide
          ASF subversion and git services added a comment -

          Commit 1650212 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1650212 ]

          SOLR-6787 commit right away instead of waiting

          Show
          ASF subversion and git services added a comment - Commit 1650212 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1650212 ] SOLR-6787 commit right away instead of waiting
          Hide
          ASF subversion and git services added a comment -

          Commit 1650214 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1650214 ]

          SOLR-6787 commit right away instead of waiting

          Show
          ASF subversion and git services added a comment - Commit 1650214 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650214 ] SOLR-6787 commit right away instead of waiting
          Hide
          ASF subversion and git services added a comment -

          Commit 1650251 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1650251 ]

          SOLR-6787 more logging

          Show
          ASF subversion and git services added a comment - Commit 1650251 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1650251 ] SOLR-6787 more logging
          Hide
          ASF subversion and git services added a comment -

          Commit 1650252 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1650252 ]

          SOLR-6787 more logging

          Show
          ASF subversion and git services added a comment - Commit 1650252 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650252 ] SOLR-6787 more logging
          Hide
          ASF subversion and git services added a comment -

          Commit 1650448 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1650448 ]

          SOLR-6787 A simple class to mask a handler defined in same path

          Show
          ASF subversion and git services added a comment - Commit 1650448 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1650448 ] SOLR-6787 A simple class to mask a handler defined in same path
          Hide
          ASF subversion and git services added a comment -

          Commit 1650449 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1650449 ]

          SOLR-6787 A simple class to mask a handler defined in same path

          Show
          ASF subversion and git services added a comment - Commit 1650449 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650449 ] SOLR-6787 A simple class to mask a handler defined in same path
          Hide
          ASF subversion and git services added a comment -

          Commit 1650729 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1650729 ]

          SOLR-6787 enable autocommit maxDocs=1 for .system collection

          Show
          ASF subversion and git services added a comment - Commit 1650729 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1650729 ] SOLR-6787 enable autocommit maxDocs=1 for .system collection
          Hide
          ASF subversion and git services added a comment -

          Commit 1650731 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1650731 ]

          SOLR-6787 enable autocommit maxDocs=1 for .system collection

          Show
          ASF subversion and git services added a comment - Commit 1650731 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1650731 ] SOLR-6787 enable autocommit maxDocs=1 for .system collection
          Hide
          Mark Miller added a comment -

          Please develop these things to a decent level outside of svn and then put up a patch. You tend to commit early half baked patches that fail a lot and then fire in a bunch of random commits after with no patches. I'm really unhappy about it and I'm going to start -1 more of these issues. It's become a common pattern. This is not how we develop on Lucene and Solr. Use a branch if you need to do this. You can't easily follow this style of dev if it's done regularly, it makes reviews and back ports difficult, and everyone has to deal with the frequently failing tests. There is no need to put your stuff in way before it's ready.

          Show
          Mark Miller added a comment - Please develop these things to a decent level outside of svn and then put up a patch. You tend to commit early half baked patches that fail a lot and then fire in a bunch of random commits after with no patches. I'm really unhappy about it and I'm going to start -1 more of these issues. It's become a common pattern. This is not how we develop on Lucene and Solr. Use a branch if you need to do this. You can't easily follow this style of dev if it's done regularly, it makes reviews and back ports difficult, and everyone has to deal with the frequently failing tests. There is no need to put your stuff in way before it's ready.
          Hide
          Noble Paul added a comment - - edited

          Mark
          How do you know something is fully developed? You build a feature , add a test do manual testing and if everything is passing commit it .
          This did not fail any old test and it is only failing the new tests in Jenkins. So, I'm fixing what I think is the issue and checking in. Even the original commit does not have any obvious problem that I know of. Sometimes I'm trying a different approach .
          There are dozens of tests failing every day, are we going to remove all of them? There were replication issues due to which tests were failing and it could have been a reason for this failure too

          Show
          Noble Paul added a comment - - edited Mark How do you know something is fully developed? You build a feature , add a test do manual testing and if everything is passing commit it . This did not fail any old test and it is only failing the new tests in Jenkins. So, I'm fixing what I think is the issue and checking in. Even the original commit does not have any obvious problem that I know of. Sometimes I'm trying a different approach . There are dozens of tests failing every day, are we going to remove all of them? There were replication issues due to which tests were failing and it could have been a reason for this failure too
          Hide
          ASF subversion and git services added a comment -

          Commit 1653451 from Noble Paul in branch 'dev/branches/lucene_solr_5_0'
          [ https://svn.apache.org/r1653451 ]

          SOLR-6787, SOLR-6801 use realtime get to verify that the versions do not collide

          Show
          ASF subversion and git services added a comment - Commit 1653451 from Noble Paul in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1653451 ] SOLR-6787 , SOLR-6801 use realtime get to verify that the versions do not collide
          Hide
          ASF subversion and git services added a comment -

          Commit 1653452 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1653452 ]

          SOLR-6787, SOLR-6801 use realtime get to verify that the versions do not collide

          Show
          ASF subversion and git services added a comment - Commit 1653452 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1653452 ] SOLR-6787 , SOLR-6801 use realtime get to verify that the versions do not collide
          Hide
          ASF subversion and git services added a comment -

          Commit 1653453 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1653453 ]

          SOLR-6787, SOLR-6801 use realtime get to verify that the versions do not collide

          Show
          ASF subversion and git services added a comment - Commit 1653453 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1653453 ] SOLR-6787 , SOLR-6801 use realtime get to verify that the versions do not collide
          Hide
          ASF subversion and git services added a comment -

          Commit 1656747 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1656747 ]

          SOLR-6787: .system collection create fails if /configs dir is not present in ZK

          Show
          ASF subversion and git services added a comment - Commit 1656747 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1656747 ] SOLR-6787 : .system collection create fails if /configs dir is not present in ZK
          Hide
          ASF subversion and git services added a comment -

          Commit 1656748 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1656748 ]

          SOLR-6787: .system collection create fails if /configs dir is not present in ZK

          Show
          ASF subversion and git services added a comment - Commit 1656748 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1656748 ] SOLR-6787 : .system collection create fails if /configs dir is not present in ZK
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.
          Hide
          Yonik Seeley added a comment -

          I just ran onto this new method on SolrQueryRequest:

          +
          +  /** Forward the request to another handler. DO a return after this call if
          +   * no other operations need to be performed
          +   * @param handler the name of the handler
          +   * @param params The new set of parameter
          +   */
          +  public void forward(String handler, SolrParams params,  SolrQueryResponse rsp);
          

          That feels like a really odd method to have on SolrQueryRequest... is there some advantage to having it there?

          Show
          Yonik Seeley added a comment - I just ran onto this new method on SolrQueryRequest: + + /** Forward the request to another handler. DO a return after this call if + * no other operations need to be performed + * @param handler the name of the handler + * @param params The new set of parameter + */ + public void forward( String handler, SolrParams params, SolrQueryResponse rsp); That feels like a really odd method to have on SolrQueryRequest... is there some advantage to having it there?
          Hide
          Yonik Seeley added a comment -

          The default implementation is also buggy... it creates a request object that is never closed.
          It's also not clear what "DO a return after this call if no other operations need to be performed" means.

          I wouldn't want to encourage people to use this... perhaps this refactor should just be reverted or moved to BlobManager (as it's currently the only user).

          Show
          Yonik Seeley added a comment - The default implementation is also buggy... it creates a request object that is never closed. It's also not clear what "DO a return after this call if no other operations need to be performed" means. I wouldn't want to encourage people to use this... perhaps this refactor should just be reverted or moved to BlobManager (as it's currently the only user).
          Hide
          Yonik Seeley added a comment -

          The default implementation is also buggy... it creates a request object that is never closed.
          It's also not clear what "DO a return after this call if no other operations need to be performed" means.

          I wouldn't want to encourage people to use this... perhaps this refactor should just be reverted or moved to BlobManager (as it's currently the only user).

          Show
          Yonik Seeley added a comment - The default implementation is also buggy... it creates a request object that is never closed. It's also not clear what "DO a return after this call if no other operations need to be performed" means. I wouldn't want to encourage people to use this... perhaps this refactor should just be reverted or moved to BlobManager (as it's currently the only user).
          Hide
          Noble Paul added a comment -

          The default implementation is also buggy... it creates a request object that is never closed.

          It is done in a try finally block. So, it will be closed

           try(LocalSolrQueryRequest r = new LocalSolrQueryRequest(getCore(), params)) {
                getCore().getRequestHandler(handler).handleRequest(r, rsp);
              }
          

          It's also not clear what "DO a return after this call if no other operations need to be performed" means.

          It means the caller can just return if the forwarded request will do all the work

          I wouldn't want to encourage people to use this... perhaps this refactor should just be reverted or moved to BlobManager (as it's currently the only user).

          The problem I see with our internal APIs is that they are mostly expert only and easy to screw up.It is easy to forget to close the request here. That is why I created a "non-expert" method which anyone can use.

          Show
          Noble Paul added a comment - The default implementation is also buggy... it creates a request object that is never closed. It is done in a try finally block. So, it will be closed try(LocalSolrQueryRequest r = new LocalSolrQueryRequest(getCore(), params)) { getCore().getRequestHandler(handler).handleRequest(r, rsp); } It's also not clear what "DO a return after this call if no other operations need to be performed" means. It means the caller can just return if the forwarded request will do all the work I wouldn't want to encourage people to use this... perhaps this refactor should just be reverted or moved to BlobManager (as it's currently the only user). The problem I see with our internal APIs is that they are mostly expert only and easy to screw up.It is easy to forget to close the request here. That is why I created a "non-expert" method which anyone can use.
          Hide
          Yonik Seeley added a comment -

          The problem I see with our internal APIs is that they are mostly expert only and easy to screw up.It is easy to forget to close the request here. That is why I created a "non-expert" method which anyone can use.

          Right... but I'm not sure at all that this won't screw up in the general case, so I don't think it's ready for "non-expert" use. I don't think handleRequest was really written to be recursive (i.e be called from handleRequest itself).

          Show
          Yonik Seeley added a comment - The problem I see with our internal APIs is that they are mostly expert only and easy to screw up.It is easy to forget to close the request here. That is why I created a "non-expert" method which anyone can use. Right... but I'm not sure at all that this won't screw up in the general case, so I don't think it's ready for "non-expert" use. I don't think handleRequest was really written to be recursive (i.e be called from handleRequest itself).
          Hide
          Noble Paul added a comment -

          I agree with you that recursive screw up is possible. Instead of removing that API itself , we should add safeguard the caller and prevent recursive calls in that method itself.

          IMHO recursive loops are caught almost immediately but resource leaks are not found until it's too late

          Show
          Noble Paul added a comment - I agree with you that recursive screw up is possible. Instead of removing that API itself , we should add safeguard the caller and prevent recursive calls in that method itself. IMHO recursive loops are caught almost immediately but resource leaks are not found until it's too late
          Hide
          Yonik Seeley added a comment -

          Even with a different handler though... "forwarding" to another request handler was never a first class supported operation before. The whole thing feels a bit squirrelly. For example: the new request object that is being created... it's closed automatically when the method returns, but the response object is still there and will presumably either be used/looked at by the caller, or used to ultimately write the response. That breaks a previously held invariant - don't use a response object after the request has been closed (it may contain state tied to the request object).

          Show
          Yonik Seeley added a comment - Even with a different handler though... "forwarding" to another request handler was never a first class supported operation before. The whole thing feels a bit squirrelly. For example: the new request object that is being created... it's closed automatically when the method returns, but the response object is still there and will presumably either be used/looked at by the caller, or used to ultimately write the response. That breaks a previously held invariant - don't use a response object after the request has been closed (it may contain state tied to the request object).
          Hide
          Noble Paul added a comment -

          don't use a response object after the request has been closed .(it may contain state tied to the request object).

          if the caller wants to see the output of another handler , what is the solution? serialize the response an deserialize it?

          Show
          Noble Paul added a comment - don't use a response object after the request has been closed .(it may contain state tied to the request object). if the caller wants to see the output of another handler , what is the solution? serialize the response an deserialize it?
          Hide
          Yonik Seeley added a comment -

          what is the solution?

          I'm pointing out the possible issues and why my gut fee was to not encourage the use of this API. The answer is to develop a solid API if we want this feature.

          Another big issue off the top of my head (that would be much harder to catch via testing):
          A different searcher may be used by the sub-request than is used by the parent request. That's going to cause all sorts of problems.
          Also less likely, the schema can change and be different from parent to sub request.
          There's also the question of lost "context", and anything that may use that (I see you use that to only do useParams once, for example. is that OK to do again?)

          Show
          Yonik Seeley added a comment - what is the solution? I'm pointing out the possible issues and why my gut fee was to not encourage the use of this API. The answer is to develop a solid API if we want this feature. Another big issue off the top of my head (that would be much harder to catch via testing): A different searcher may be used by the sub-request than is used by the parent request. That's going to cause all sorts of problems. Also less likely, the schema can change and be different from parent to sub request. There's also the question of lost "context", and anything that may use that (I see you use that to only do useParams once, for example. is that OK to do again?)
          Hide
          Noble Paul added a comment -

          I looked into the code of LocalSolrQueryRequest

          If uses the core that was there in the parent request. So, unless the parent request is closed , this one is safe. As long as the forward() happens in the same thread , it is safe.

          The only risk I see is if the schema changes in flight

          There's also the question of lost "context", and anything that may use that

          This is probably the only one I see as a problem. But, don't you think we should fix this by having a special ForwardableSolrRequest which uses the same core, schema and context from the parent request

          (I see you use that to only do useParams once, for example. is that OK to do again?)

          That is used for a moment and removed immediately before anyone can see it

          Show
          Noble Paul added a comment - I looked into the code of LocalSolrQueryRequest If uses the core that was there in the parent request. So, unless the parent request is closed , this one is safe. As long as the forward() happens in the same thread , it is safe. The only risk I see is if the schema changes in flight There's also the question of lost "context", and anything that may use that This is probably the only one I see as a problem. But, don't you think we should fix this by having a special ForwardableSolrRequest which uses the same core, schema and context from the parent request (I see you use that to only do useParams once, for example. is that OK to do again?) That is used for a moment and removed immediately before anyone can see it
          Hide
          Yonik Seeley added a comment -

          The only risk I see is if the schema changes in flight

          The searcher changing is both more likely and higher risk of very bad failures. FYI, I opened SOLR-7238.

          Show
          Yonik Seeley added a comment - The only risk I see is if the schema changes in flight The searcher changing is both more likely and higher risk of very bad failures. FYI, I opened SOLR-7238 .
          Hide
          Yonik Seeley added a comment -

          Urg... while working on SOLR-7957, I've just been bit by this bad implementation of forward() that closes a request before the response has been written.

          Show
          Yonik Seeley added a comment - Urg... while working on SOLR-7957 , I've just been bit by this bad implementation of forward() that closes a request before the response has been written.
          Hide
          ASF subversion and git services added a comment -

          Commit 1698226 from Yonik Seeley in branch 'dev/trunk'
          [ https://svn.apache.org/r1698226 ]

          SOLR-6787: fix BlobHandler.forward to not close request until after response has been written

          Show
          ASF subversion and git services added a comment - Commit 1698226 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1698226 ] SOLR-6787 : fix BlobHandler.forward to not close request until after response has been written

            People

            • Assignee:
              Noble Paul
              Reporter:
              Noble Paul
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development