Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: JavaScript View Server
    • Labels:
      None
    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      The view engine has been creaky and cluttered. As shown by GeoCouch, adding new indexers basically involves copying the entire view engine and hacking the parts that are different. In short, the opposite of good engineering.

      Over the last couple weeks I've refactored the view engine and reimplemented the map/reduce view engine. These changes are 100% internal and no external behavior has changed. Performance is just a tiny bit better than trunk. I did do some playing trying to improve view update times and there are some dances we could do, but for the time being I wanted to keep the same general architecture for updates so that the changes are minimal.

      1. 0001-Minor-changes-for-new-indexing-engine.patch
        9 kB
        Paul Joseph Davis
      2. 0002-Create-the-couch_index-application.patch
        36 kB
        Paul Joseph Davis
      3. 0003-Create-the-couch_mrview-application.patch
        118 kB
        Paul Joseph Davis
      4. 0004-Remove-the-old-view-engine.patch
        117 kB
        Paul Joseph Davis

        Activity

        Hide
        Filipe Manana added a comment -

        Great work here Paul, it's really nice.

        I haven't done a very detailed analysis, but so far, 2 minor details:

        1) In couch_index_compactor (patch 2), the pid in state is assigned the default value of 'nil', but everywhere else in the module, you match against/use 'undefined'. It would be better to use the same everywhere, or an is_pid(Pid) call;

        2) When a view compaction is canceled, the partially compacted file is not deleted (unlike trunk's current behaviour). Since we don't support resuming of view compaction, it's useless to leave it there and be deleted only on the next compaction request. Plus, leaving it there, it can prevent other compactions from succeeding due to not enough free disk space

        Show
        Filipe Manana added a comment - Great work here Paul, it's really nice. I haven't done a very detailed analysis, but so far, 2 minor details: 1) In couch_index_compactor (patch 2), the pid in state is assigned the default value of 'nil', but everywhere else in the module, you match against/use 'undefined'. It would be better to use the same everywhere, or an is_pid(Pid) call; 2) When a view compaction is canceled, the partially compacted file is not deleted (unlike trunk's current behaviour). Since we don't support resuming of view compaction, it's useless to leave it there and be deleted only on the next compaction request. Plus, leaving it there, it can prevent other compactions from succeeding due to not enough free disk space
        Hide
        Filipe Manana added a comment -

        Also, can you share those performance test results?

        Show
        Filipe Manana added a comment - Also, can you share those performance test results?
        Hide
        Benoit Chesneau added a comment -

        I'm still reading the code. Just some another minor details,

        • couch_index_api.erl should probably better expose a behavior, so as anything that will be needed in indexers modules. So missing functions will be handled at compilation.
        • supervision. maybe couch_index_updater:start_link(self(), Mod), & couch_index_compactor:start_link(self(), Mod), could be added as a child to a supervisor. So thet could eventually been restared using erlang supervision.

        Also just wanted to say I really like the current design.

        Show
        Benoit Chesneau added a comment - I'm still reading the code. Just some another minor details, couch_index_api.erl should probably better expose a behavior, so as anything that will be needed in indexers modules. So missing functions will be handled at compilation. supervision. maybe couch_index_updater:start_link(self(), Mod), & couch_index_compactor:start_link(self(), Mod), could be added as a child to a supervisor. So thet could eventually been restared using erlang supervision. Also just wanted to say I really like the current design.
        Hide
        Paul Joseph Davis added a comment -

        @Filipe

        Good catch on both. For point one, I just removed the nil default. In my refactoring I undid almost all of these setting record fields to nil because its fairly silly to use a nonstandard value for undefined when the standard value is literally, undefined. One place I didn't change this yet was for the qserver variable because of how couch_query_servers:stop_doc_map is written and I was wanting to minimize the scope of these changes.

        As to 2, I added the file removal in the cancellation function. One thing this doesn't address is a generalized method for handling the new compaction lifetime model. We'll have to revisit this a bit to try and get a better abstraction for compaction.

        The timing tests I was running were against the views in [1] which you've used before for these types of tests. The test is just a simple "time curl http://127.0.0.1:5984/indexer_test/_design/test/_view/view1?limit=0" or w/e the URL is for that design doc. Fairly consistently the new-views are a minute or two faster than trunk. So, nothing super improved but convincing enough to me that I haven't trashed performance.

        Also, I did spend a couple days profiling the view engine and comparing a couple dirty hacks. I'm convinced we can shorten these times even more with some more refactoring and playing with some other optimizations but they were leading me deep into the couch_query_servers and couch_os_process. I plan on backporting some refactoring for these parts next and that should set us up for some of the more advanced changes to the view engine.

        Trunk: 26m29.809s
        New-Views: 24m50.923s

        [1] http://fdmanana.couchone.com/indexer_test

        @Benoit

        I'd agree on turnign couch_index_api into a behavior but I decided to delay that until after we update the various applications and build system so that we have a proper ERL_LIBS directory. For behaviours to work they need to be importable and that's not the easiest given our current structure. Though it'll definitely be something to add in the future.

        For supervisors I think that could be a good idea, but I'm not the most familiar with them and their side effects to know if they'd be a good fit here. You mentioned on IRC that you'd take a swing at implementing that and I'd be quite interested to see what you come up witih.

        Show
        Paul Joseph Davis added a comment - @Filipe Good catch on both. For point one, I just removed the nil default. In my refactoring I undid almost all of these setting record fields to nil because its fairly silly to use a nonstandard value for undefined when the standard value is literally, undefined. One place I didn't change this yet was for the qserver variable because of how couch_query_servers:stop_doc_map is written and I was wanting to minimize the scope of these changes. As to 2, I added the file removal in the cancellation function. One thing this doesn't address is a generalized method for handling the new compaction lifetime model. We'll have to revisit this a bit to try and get a better abstraction for compaction. The timing tests I was running were against the views in [1] which you've used before for these types of tests. The test is just a simple "time curl http://127.0.0.1:5984/indexer_test/_design/test/_view/view1?limit=0 " or w/e the URL is for that design doc. Fairly consistently the new-views are a minute or two faster than trunk. So, nothing super improved but convincing enough to me that I haven't trashed performance. Also, I did spend a couple days profiling the view engine and comparing a couple dirty hacks. I'm convinced we can shorten these times even more with some more refactoring and playing with some other optimizations but they were leading me deep into the couch_query_servers and couch_os_process. I plan on backporting some refactoring for these parts next and that should set us up for some of the more advanced changes to the view engine. Trunk: 26m29.809s New-Views: 24m50.923s [1] http://fdmanana.couchone.com/indexer_test @Benoit I'd agree on turnign couch_index_api into a behavior but I decided to delay that until after we update the various applications and build system so that we have a proper ERL_LIBS directory. For behaviours to work they need to be importable and that's not the easiest given our current structure. Though it'll definitely be something to add in the future. For supervisors I think that could be a good idea, but I'm not the most familiar with them and their side effects to know if they'd be a good fit here. You mentioned on IRC that you'd take a swing at implementing that and I'd be quite interested to see what you come up witih.
        Hide
        Filipe Manana added a comment -

        Paul,

        All fine for me

        Show
        Filipe Manana added a comment - Paul, All fine for me
        Hide
        Jan Lehnardt added a comment -

        Paul, outstanding work!

        I'm still reading, but here are a few things questions that come up for me. (I expect most of these to stem from my inexperience with the current view engine).

        In swap_compacted/2 you first unlink the old and then link the new fd. I'm unsure about the specifics, but what happens in case of a crash in the middle? Would linking the new one first and unlinking the old one after that make any difference?

        Turns out "few" meant "one".

        Finally, some style nitpicks. Some of your chosen symbol names are "too short" for my taste, but then I tend to be of the very_verbose_camp. Consider all of this a bike shed.

        view_cb -> view_callback (and other _cbs)
        red_fold -> reduce_fold
        mrst -> mrstate / mr_state? (although only in the declaration, when in use it is usually clear)
        couch_mrview_http.erl -> we tend to use "httpd" in file names. I'm happy to break with that tradition if we have a good reason
        ad_skey_opts -> all_docs_start[_]key_opts

        Again, great work Paul!

        @Benoit, you wrote "Also just wanted to say I really like the current design.", do you mean you like Paul's design or the "old" view engine?

        Show
        Jan Lehnardt added a comment - Paul, outstanding work! I'm still reading, but here are a few things questions that come up for me. (I expect most of these to stem from my inexperience with the current view engine). In swap_compacted/2 you first unlink the old and then link the new fd. I'm unsure about the specifics, but what happens in case of a crash in the middle? Would linking the new one first and unlinking the old one after that make any difference? Turns out "few" meant "one". Finally, some style nitpicks. Some of your chosen symbol names are "too short" for my taste, but then I tend to be of the very_verbose_camp. Consider all of this a bike shed. view_cb -> view_callback (and other _cbs) red_fold -> reduce_fold mrst -> mrstate / mr_state? (although only in the declaration, when in use it is usually clear) couch_mrview_http.erl -> we tend to use "httpd" in file names. I'm happy to break with that tradition if we have a good reason ad_skey_opts -> all_docs_start [_] key_opts Again, great work Paul! @Benoit, you wrote "Also just wanted to say I really like the current design.", do you mean you like Paul's design or the "old" view engine?
        Hide
        Paul Joseph Davis added a comment -

        @Jan,

        Bit confusing with these in close proximity. The function calls to link/unlink are just the Erlang process links. Ie, "we stop caring about the old fd, and only want to know about the new fd. Just below that is the rename-delete/rename dance dealing with filesystem links. This is the same behavior as trunk where it is possible that we lose the compacted file with a well timed crash. Trunk still doesn't address this and I haven't spent time thinking through all of the edge cases picking up a compaction file if the main data file is gone. Suffice to say, behavior is the same for that switch but we haven't figured out the best way to make that more robust (though, I don't know of a single report where this broke and caused reindexing).

        As to variable naming, it looks like you knew what they all meant. mrst could use some documentation just because its so core to the m/r indexer. Also, most of these are internal and shouldn't be user visible which means that their context is well defined within the relevant code.

        Also, dropping the d from http was quite intentional. I mean, why is the d there to begin with?

        Show
        Paul Joseph Davis added a comment - @Jan, Bit confusing with these in close proximity. The function calls to link/unlink are just the Erlang process links. Ie, "we stop caring about the old fd, and only want to know about the new fd. Just below that is the rename-delete/rename dance dealing with filesystem links. This is the same behavior as trunk where it is possible that we lose the compacted file with a well timed crash. Trunk still doesn't address this and I haven't spent time thinking through all of the edge cases picking up a compaction file if the main data file is gone. Suffice to say, behavior is the same for that switch but we haven't figured out the best way to make that more robust (though, I don't know of a single report where this broke and caused reindexing). As to variable naming, it looks like you knew what they all meant. mrst could use some documentation just because its so core to the m/r indexer. Also, most of these are internal and shouldn't be user visible which means that their context is well defined within the relevant code. Also, dropping the d from http was quite intentional. I mean, why is the d there to begin with?
        Hide
        Benoit Chesneau added a comment -

        @jan I mean the patch. I like the new view design

        Show
        Benoit Chesneau added a comment - @jan I mean the patch. I like the new view design
        Hide
        Adam Kocoloski added a comment -

        I agree with Benoit that defining a custom behaviour seems like the right approach instead of couch_index_api. I guess that'd require some mucking with the build system to ensure that the compiler knows the behaviour spec? It's not a blocker from my perspective, but I would like to see it happen.

        I haven't been too impressed with supervisors that dynamically add children. I'd love it if we didn't have to implement our own supervision trees everywhere, but I think that might be something we tackle in a separate ticket.

        Show
        Adam Kocoloski added a comment - I agree with Benoit that defining a custom behaviour seems like the right approach instead of couch_index_api. I guess that'd require some mucking with the build system to ensure that the compiler knows the behaviour spec? It's not a blocker from my perspective, but I would like to see it happen. I haven't been too impressed with supervisors that dynamically add children. I'd love it if we didn't have to implement our own supervision trees everywhere, but I think that might be something we tackle in a separate ticket.
        Hide
        Adam Kocoloski added a comment -

        Oh, one other minor comment - I think erlang:send_after/3 is always preferable to timer:send_after/2. I really don't know why the latter function exists.

        Show
        Adam Kocoloski added a comment - Oh, one other minor comment - I think erlang:send_after/3 is always preferable to timer:send_after/2. I really don't know why the latter function exists.
        Hide
        Paul Joseph Davis added a comment -

        I would like to mention that couch_index_api.erl current does nothing at all. It was just my scratch pad for what the indexer API would look like along with the possibility that it eventually gets turned into a behavior. As Adam points out I think this is best left until our source tree and installed structure are better aligned with Erlang expectations before we force this issue.

        Will fix send_after.

        Show
        Paul Joseph Davis added a comment - I would like to mention that couch_index_api.erl current does nothing at all. It was just my scratch pad for what the indexer API would look like along with the possibility that it eventually gets turned into a behavior. As Adam points out I think this is best left until our source tree and installed structure are better aligned with Erlang expectations before we force this issue. Will fix send_after.
        Hide
        Paul Joseph Davis added a comment -

        Updated per Adam's suggestions. Also removed all trailing whitespace edits. git --config color.ui true FTW.

        Show
        Paul Joseph Davis added a comment - Updated per Adam's suggestions. Also removed all trailing whitespace edits. git --config color.ui true FTW.
        Hide
        Bob Dionne added a comment -

        +1

        I think the compact and compact_swap functions make the API somewhat specific, but a no-op implementation is easy. Implementers will certainly need to know the atoms that the `get(Field,State)` function is looking for. All the tests run with the latest version.

        Well done Paul!

        Show
        Bob Dionne added a comment - +1 I think the compact and compact_swap functions make the API somewhat specific, but a no-op implementation is easy. Implementers will certainly need to know the atoms that the `get(Field,State)` function is looking for. All the tests run with the latest version. Well done Paul!
        Hide
        Paul Joseph Davis added a comment -

        @Bob,

        Totally see your point on being overly specific. That was one of my biggest worries when starting this work. I had GeoCouch updated to work with a slightly older version of this patch (that I'll rebase shortly) and have played with a few other types of indexers for giggles. One thing I can say is that even when an indexer might not have a user controllable compaction, it seems common enough that it should be exposed for when it exists.

        Show
        Paul Joseph Davis added a comment - @Bob, Totally see your point on being overly specific. That was one of my biggest worries when starting this work. I had GeoCouch updated to work with a slightly older version of this patch (that I'll rebase shortly) and have played with a few other types of indexers for giggles. One thing I can say is that even when an indexer might not have a user controllable compaction, it seems common enough that it should be exposed for when it exists.
        Hide
        Paul Joseph Davis added a comment -

        Now that this has been on JIRA for a while I'm gonna start asking for people to raise objections to committing it. The feed back I've seen so far has been positive and the only concerns voiced are in terms of things we should look at moving forward. I'd be more than happy to wait if anyone else wants more time to review it, but I'd like to make use of this relative lull in development for this sizable of a change to land and get more widespread testing.

        Show
        Paul Joseph Davis added a comment - Now that this has been on JIRA for a while I'm gonna start asking for people to raise objections to committing it. The feed back I've seen so far has been positive and the only concerns voiced are in terms of things we should look at moving forward. I'd be more than happy to wait if anyone else wants more time to review it, but I'd like to make use of this relative lull in development for this sizable of a change to land and get more widespread testing.
        Hide
        Jan Lehnardt added a comment -

        @Paul,

        I understood link/unlink to be the Erlang versions, not the POSIX versions, but was still curious about the order. Thanks for clarifying that this is like it is today.

        For the variable naming, while I was able to figure out what it means, the cognitive extra load didn't help understanding the code. I'm an advocate of erring not he side of clearness. I don't buy the 'internal code" argument as my future self will have to jump into that code again some time in the future and I'm trying to make this easier for myself later. I understand that coming right out of the code after writing it makes it all obvious, but that's usually not a long term property and it is only in your head. That said, not a blocker, happy to "clean that up" later, but I fugue I raise this "while we are at it"

        As for the "d", that's from the old days when we used the httpd that ships with Erlang and CouchDB was an (think Apache httpd) mod_couch module and the callback module was called couch_httpd.erl. So yay, ancient history, lets nuke that d.

        Onwards!

        Show
        Jan Lehnardt added a comment - @Paul, I understood link/unlink to be the Erlang versions, not the POSIX versions, but was still curious about the order. Thanks for clarifying that this is like it is today. For the variable naming, while I was able to figure out what it means, the cognitive extra load didn't help understanding the code. I'm an advocate of erring not he side of clearness. I don't buy the 'internal code" argument as my future self will have to jump into that code again some time in the future and I'm trying to make this easier for myself later. I understand that coming right out of the code after writing it makes it all obvious, but that's usually not a long term property and it is only in your head. That said, not a blocker, happy to "clean that up" later, but I fugue I raise this "while we are at it" As for the "d", that's from the old days when we used the httpd that ships with Erlang and CouchDB was an (think Apache httpd) mod_couch module and the callback module was called couch_httpd.erl. So yay, ancient history, lets nuke that d. Onwards!
        Hide
        Volker Mische added a comment -

        Great patch. Here's my review. I haven't spend too much time on src/couch_mrview as I expect the code to basically the the same as in the current implementation.

        General:
        The style for case indentation is not always CouchDB style. The first level shouldn't be indented (I know that's a nit-pick

        Configuration change:
        The configuration variable for the indexes in the config was changed from "view_index_dir" to "index_dir" with at mrviews sub-directory. Can we break such things in 1.2?
        src/couchdb/couch_compaction_daemon.erl still contains a reference to view_index_dir.

        API addition:
        It should be documented that now you can compact views not only with db/_compact/design-doc but also with db/design_doc/_compact, which actually makes a lot more sense. The old one should be deprecated.

        Show
        Volker Mische added a comment - Great patch. Here's my review. I haven't spend too much time on src/couch_mrview as I expect the code to basically the the same as in the current implementation. General: The style for case indentation is not always CouchDB style. The first level shouldn't be indented (I know that's a nit-pick Configuration change: The configuration variable for the indexes in the config was changed from "view_index_dir" to "index_dir" with at mrviews sub-directory. Can we break such things in 1.2? src/couchdb/couch_compaction_daemon.erl still contains a reference to view_index_dir. API addition: It should be documented that now you can compact views not only with db/_compact/design-doc but also with db/design_doc/_compact, which actually makes a lot more sense. The old one should be deprecated.
        Hide
        Paul Joseph Davis added a comment -

        @Jan,

        Actually, now that I look closer at the old code compared to the new code, the link/unlink ordering is slightly different because of how the processes are managed. Though one thing that's important is that the unlink does come after the file:delete which I'll move to be more like the old code.

        @Volker

        The case indentation is the same as I've always written it. I don't know that we've ever dictated anything for it in terms of code guidelines. Though if we did, the way I do it is obviously superior.

        Good call on the index_dir vs view_index_dir. The original motivation for that change was so I could develop the new indexer along side the old indexer without colliding. Though now that its generalized to indexes having the view in the name seems odd. Perhaps I'll add a function that checks index_dir first and then if that's not set uses view_index_dir and prints a deprecation warning? And I should also add a couch_index_util:root_dir(ModuleName) that creates the subdirectory for each type of indexer automatically. That'd probably cleanup some of that code.

        For the compaction API change, I'm in favor of deprecating the old and removing it after 1.2, but I'm not sure on the details of actually doing that. Add a header? Just note it on the wiki after this is merged?

        Anyway, I'll get these changes in here in a few minutes.

        Show
        Paul Joseph Davis added a comment - @Jan, Actually, now that I look closer at the old code compared to the new code, the link/unlink ordering is slightly different because of how the processes are managed. Though one thing that's important is that the unlink does come after the file:delete which I'll move to be more like the old code. @Volker The case indentation is the same as I've always written it. I don't know that we've ever dictated anything for it in terms of code guidelines. Though if we did, the way I do it is obviously superior. Good call on the index_dir vs view_index_dir. The original motivation for that change was so I could develop the new indexer along side the old indexer without colliding. Though now that its generalized to indexes having the view in the name seems odd. Perhaps I'll add a function that checks index_dir first and then if that's not set uses view_index_dir and prints a deprecation warning? And I should also add a couch_index_util:root_dir(ModuleName) that creates the subdirectory for each type of indexer automatically. That'd probably cleanup some of that code. For the compaction API change, I'm in favor of deprecating the old and removing it after 1.2, but I'm not sure on the details of actually doing that. Add a header? Just note it on the wiki after this is merged? Anyway, I'll get these changes in here in a few minutes.
        Hide
        Paul Joseph Davis added a comment -

        New version includes the minor change to the compaction link handling and new helpers for the view index directory.

        Show
        Paul Joseph Davis added a comment - New version includes the minor change to the compaction link handling and new helpers for the view index directory.
        Hide
        Paul Joseph Davis added a comment -

        Small update re-adds the deleted subtree skipping in _all_docs.

        Show
        Paul Joseph Davis added a comment - Small update re-adds the deleted subtree skipping in _all_docs.
        Hide
        Benoit Chesneau added a comment -

        +1 for this patch. All tests are green. I think it's good to have it now in trunk and fix last bits imo.

        @davis what is the status of couch_query_serrvers, how do you expect it? I find it odd to have it in couchdb folder while most of the code of views are in couch_index in couch_mrview now.

        Show
        Benoit Chesneau added a comment - +1 for this patch. All tests are green. I think it's good to have it now in trunk and fix last bits imo. @davis what is the status of couch_query_serrvers, how do you expect it? I find it odd to have it in couchdb folder while most of the code of views are in couch_index in couch_mrview now.
        Hide
        Paul Joseph Davis added a comment -

        @Benoit

        couch_query_servers is next on the agenda. I didn't include it here to keep the scope of this patch down.

        Show
        Paul Joseph Davis added a comment - @Benoit couch_query_servers is next on the agenda. I didn't include it here to keep the scope of this patch down.
        Hide
        Filipe Manana added a comment -

        Paul just one question, looking at the 3rd patch:

        diff --git a/share/www/script/test/view_compaction.js b/share/www/script/test/view_compaction.js
        index 4c75184..ebf6fe1 100644
        — a/share/www/script/test/view_compaction.js
        +++ b/share/www/script/test/view_compaction.js
        @@ -87,7 +87,7 @@ couchTests.view_compaction = function(debug) {
        T(data_size_before_compact < disk_size_before_compact, "data size < file size");

        // compact view group

        • var xhr = CouchDB.request("POST", "/" + db.name + "/_compact" + "/foo");
          + var xhr = CouchDB.request("POST", "/" + db.name + "/_design/foo/_compact");
          T(JSON.parse(xhr.responseText).ok === true);

        resp = db.designInfo("_design/foo");
        diff --git a/src/Makefile.am b/src/Makefile.am

        Has the URI for view compaction changed or is it just an alternative?

        Show
        Filipe Manana added a comment - Paul just one question, looking at the 3rd patch: diff --git a/share/www/script/test/view_compaction.js b/share/www/script/test/view_compaction.js index 4c75184..ebf6fe1 100644 — a/share/www/script/test/view_compaction.js +++ b/share/www/script/test/view_compaction.js @@ -87,7 +87,7 @@ couchTests.view_compaction = function(debug) { T(data_size_before_compact < disk_size_before_compact, "data size < file size"); // compact view group var xhr = CouchDB.request("POST", "/" + db.name + "/_compact" + "/foo"); + var xhr = CouchDB.request("POST", "/" + db.name + "/_design/foo/_compact"); T(JSON.parse(xhr.responseText).ok === true); resp = db.designInfo("_design/foo"); diff --git a/src/Makefile.am b/src/Makefile.am Has the URI for view compaction changed or is it just an alternative?
        Hide
        Paul Joseph Davis added a comment -

        There's a backwards compatibility shim for the old style URL as well [1]. I first wrote that not even thinking it was a break.

        [1] https://github.com/davisp/couchdb/blob/new-views/src/couchdb/couch_httpd_db.erl#L128

        Show
        Paul Joseph Davis added a comment - There's a backwards compatibility shim for the old style URL as well [1] . I first wrote that not even thinking it was a break. [1] https://github.com/davisp/couchdb/blob/new-views/src/couchdb/couch_httpd_db.erl#L128
        Hide
        Paul Joseph Davis added a comment -

        Another call for comments before committing. I'd like to keep on this before it goes stale.

        Show
        Paul Joseph Davis added a comment - Another call for comments before committing. I'd like to keep on this before it goes stale.
        Hide
        Jan Lehnardt added a comment -

        Just thinking releases, I'd like to branch 1.2.x sooner than latter. Do you recon this is a good patch to go into 1.2.x or should it go into trunk after branching? I'm happy to branch 1.2.x so you can commit this to trunk any time. I think I'd feel more comfortable having this simmer in trunk for 1.3.x (given that I'd like start work on the 1.2.x release about now, not commenting on the quality of the patch, just is time-new-ly-ness and the general and conservative thinking, if this ramble makes any sense

        Show
        Jan Lehnardt added a comment - Just thinking releases, I'd like to branch 1.2.x sooner than latter. Do you recon this is a good patch to go into 1.2.x or should it go into trunk after branching? I'm happy to branch 1.2.x so you can commit this to trunk any time. I think I'd feel more comfortable having this simmer in trunk for 1.3.x (given that I'd like start work on the 1.2.x release about now, not commenting on the quality of the patch, just is time-new-ly-ness and the general and conservative thinking, if this ramble makes any sense
        Hide
        Paul Joseph Davis added a comment -

        I'd like to have it in trunk sooner rather than later so that it doesn't start drifting too much. There are already commits coming in that affect this patch.

        That said, I'd say its not a bad idea to have it wait to commit it until after 1.2.x is branched. Perhaps its time to start a thread on dev@ on any outstanding issues that need to be resolved before we branch.

        Show
        Paul Joseph Davis added a comment - I'd like to have it in trunk sooner rather than later so that it doesn't start drifting too much. There are already commits coming in that affect this patch. That said, I'd say its not a bad idea to have it wait to commit it until after 1.2.x is branched. Perhaps its time to start a thread on dev@ on any outstanding issues that need to be resolved before we branch.
        Hide
        Filipe Manana added a comment -

        Paul, I have a few questions/remarks, all are minor things that I didn't notice before and I don't consider them blockers of any kind.

        1) There was an _all_docs with ?include_docs=true optimization which avoided doing a lookups in the id btree to get the documents. This seems to be gone (COUCHDB-1061). I can help here;

        2) There's no information dumped to the log file when the updater starts, stops or checkpoints. At least when it starts and stops, I find it useful to have it in the log. Dunno if this was intentional or not;

        3) When the view group is shutdown, because the associated database was closed, there's no information logged anymore. I find the logging useful for this scenario, for example to diagnose COUCHDB-1283;

        4) For the view index file, we used to have a ref counter for each. I don't see anything equivalent now. What happens when a client is folding a view and before it finishes folding it, the view is compacted, the file switch happens and the original file is deleted?

        Once again, great work on this refactoring.

        Show
        Filipe Manana added a comment - Paul, I have a few questions/remarks, all are minor things that I didn't notice before and I don't consider them blockers of any kind. 1) There was an _all_docs with ?include_docs=true optimization which avoided doing a lookups in the id btree to get the documents. This seems to be gone ( COUCHDB-1061 ). I can help here; 2) There's no information dumped to the log file when the updater starts, stops or checkpoints. At least when it starts and stops, I find it useful to have it in the log. Dunno if this was intentional or not; 3) When the view group is shutdown, because the associated database was closed, there's no information logged anymore. I find the logging useful for this scenario, for example to diagnose COUCHDB-1283 ; 4) For the view index file, we used to have a ref counter for each. I don't see anything equivalent now. What happens when a client is folding a view and before it finishes folding it, the view is compacted, the file switch happens and the original file is deleted? Once again, great work on this refactoring.
        Hide
        Paul Joseph Davis added a comment -

        For 2 and 3, that was mostly because I was developing without access to the couch_db.hrl for a long time and was just using tests to check. Adding those messages back is probably a good idea. I'll do that when I get into the office.

        For 1 it's quite possible that I undid and optimization there. I was mostly just reading the old _all_docs and fitting it into the new call backs. I'll read up on 1061 and fix that.

        4 was a whoopsie on my part. At first I intentionally got rid of the ref counter not thinking it's necessary. But after talking with Adam about the compaction swap I realized that it is necessary for when someone is reading a view during that swap. I'll add this back as well today.

        For log messages, I'll plan on just adding start/stop/closed_because_db_closed. The checkpoint updates are just log noise to me so unless someone has strong feelings to include them I'll probably skip that.

        Show
        Paul Joseph Davis added a comment - For 2 and 3, that was mostly because I was developing without access to the couch_db.hrl for a long time and was just using tests to check. Adding those messages back is probably a good idea. I'll do that when I get into the office. For 1 it's quite possible that I undid and optimization there. I was mostly just reading the old _all_docs and fitting it into the new call backs. I'll read up on 1061 and fix that. 4 was a whoopsie on my part. At first I intentionally got rid of the ref counter not thinking it's necessary. But after talking with Adam about the compaction swap I realized that it is necessary for when someone is reading a view during that swap. I'll add this back as well today. For log messages, I'll plan on just adding start/stop/closed_because_db_closed. The checkpoint updates are just log noise to me so unless someone has strong feelings to include them I'll probably skip that.
        Hide
        Paul Joseph Davis added a comment -

        Last series of commits have addressed all outstanding issues in this ticket. Any new bugs should just be raised as new tickets.

        Show
        Paul Joseph Davis added a comment - Last series of commits have addressed all outstanding issues in this ticket. Any new bugs should just be raised as new tickets.
        Hide
        Christopher Bonhage added a comment -

        Paul: I had a minor quibble regarding internal row format consistency that I filed as https://issues.apache.org/jira/browse/COUCHDB-1291

        Show
        Christopher Bonhage added a comment - Paul: I had a minor quibble regarding internal row format consistency that I filed as https://issues.apache.org/jira/browse/COUCHDB-1291

          People

          • Assignee:
            Unassigned
            Reporter:
            Paul Joseph Davis
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development