CouchDB
  1. CouchDB
  2. COUCHDB-86

(CouchDB on Windows) compaction can not be done.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.11.1, 1.0
    • Component/s: Build System
    • Labels:
      None
    • Environment:

      Windows XP, Erlang/OTP R12B-3

      Description

      During compacting, rename the current DB file to a .old file is not allowed on Windows.

      A possible workaround for this could be:
      1. Close current DB file (.couch);
      2. Send db_updated to update to use .compact;
      3. After 5sec, delete the .couch file; ---- This is done in a linked process, after that, this process send a message to update_loop;
      4. After received the message in update_loop, close current DB file which is a .compact file, then rename it to .couch;
      5. Finally, db_updated again to use this new .couch file.

      Maybe, there would be a "pause" in service?

        Issue Links

          Activity

          Hide
          Jan Lehnardt added a comment -

          flag for 1.0. Windows support would be nice.

          Show
          Jan Lehnardt added a comment - flag for 1.0. Windows support would be nice.
          Hide
          Paul Joseph Davis added a comment -

          We shouldn't allow interruptions in service just because Windows doesn't allow the unlink/replace semantics. See the notes on COUCHDB-67 for a lead on getting compaction to work for windows.

          Show
          Paul Joseph Davis added a comment - We shouldn't allow interruptions in service just because Windows doesn't allow the unlink/replace semantics. See the notes on COUCHDB-67 for a lead on getting compaction to work for windows.
          Hide
          Juhani Ränkimies added a comment -

          I think this should be reopended because COUCHDB-67 seems stalled; it was bumped to next version, again, and changed to minor. Also, it doesn't really describe this problem.

          Show
          Juhani Ränkimies added a comment - I think this should be reopended because COUCHDB-67 seems stalled; it was bumped to next version, again, and changed to minor. Also, it doesn't really describe this problem.
          Hide
          Paul Joseph Davis added a comment -

          COUCHDB-67 is only an outline for a reasonable way to fix this. I don't think we'd need to go to all the lengths that it proposed, but I don't know much about Windows file handling other than "its different."

          If there are any Windows devs with Erlang experience out there, help would definitely be appreciated on this one. I for one don't even have access to a test machine to reproduce this problem.

          Show
          Paul Joseph Davis added a comment - COUCHDB-67 is only an outline for a reasonable way to fix this. I don't think we'd need to go to all the lengths that it proposed, but I don't know much about Windows file handling other than "its different." If there are any Windows devs with Erlang experience out there, help would definitely be appreciated on this one. I for one don't even have access to a test machine to reproduce this problem.
          Hide
          Mark Hammond added a comment -

          > If there are any Windows devs with Erlang experience out there,
          > help would definitely be appreciated on this one.

          It sounds like you are after help in making the Windows file-system act like a *nix one, but I fear that is not possible. FILE_SHARE_DELETE isn't an option as although the DB can be deleted, it still can't be recreated - leaving us in exactly the same position.

          I believe we need someone with enough understanding of couchdb to propose and (help) implement how couchdb could perform these operations given the characteristics of the file-system - it is only then a windows dev can help. As Paul noted in the past, COUCHDB-67 is such an idea (but as Juhani mentioned, that isn't getting traction)

          Show
          Mark Hammond added a comment - > If there are any Windows devs with Erlang experience out there, > help would definitely be appreciated on this one. It sounds like you are after help in making the Windows file-system act like a *nix one, but I fear that is not possible. FILE_SHARE_DELETE isn't an option as although the DB can be deleted, it still can't be recreated - leaving us in exactly the same position. I believe we need someone with enough understanding of couchdb to propose and (help) implement how couchdb could perform these operations given the characteristics of the file-system - it is only then a windows dev can help. As Paul noted in the past, COUCHDB-67 is such an idea (but as Juhani mentioned, that isn't getting traction)
          Hide
          Juhani Ränkimies added a comment -

          Once I get my erlang build env set up, I'd like to try moving the 'swap' operation to couch_file process that has the descriptor for the active file. There it would be possible to explicitly close the active file before renaming.

          Show
          Juhani Ränkimies added a comment - Once I get my erlang build env set up, I'd like to try moving the 'swap' operation to couch_file process that has the descriptor for the active file. There it would be possible to explicitly close the active file before renaming.
          Hide
          Juhani Ränkimies added a comment -

          http://github.com/juranki/couchdb/commit/e17bc9f988b2ad2e59ce1f654e0433cabd63e677 worked for me in light manual testing.

          Compaction test in the test suite did compact the database but reported failure because of this error:

          [Fri, 05 Feb 2010 18:33:24 GMT] [error] [<0.121.0>] Uncaught error in HTTP request: {error,{badmatch,

          {error,badarg}

          }}
          [Fri, 05 Feb 2010 18:33:24 GMT] [info] [<0.121.0>] Stacktrace: [

          {couch_db,get_db_info,1}

          ,

          {couch_httpd_db,db_req,2}

          ,

          {couch_httpd_db,do_db_req,2}

          ,

          {couch_httpd,handle_request,5}

          ,

          {mochiweb_http,headers,5}

          ,

          {proc_lib,init_p_do_apply,3}

          ]

          A build with the patch can be found at http://github.com/juranki/couchdb/downloads.

          Show
          Juhani Ränkimies added a comment - http://github.com/juranki/couchdb/commit/e17bc9f988b2ad2e59ce1f654e0433cabd63e677 worked for me in light manual testing. Compaction test in the test suite did compact the database but reported failure because of this error: [Fri, 05 Feb 2010 18:33:24 GMT] [error] [<0.121.0>] Uncaught error in HTTP request: {error,{badmatch, {error,badarg} }} [Fri, 05 Feb 2010 18:33:24 GMT] [info] [<0.121.0>] Stacktrace: [ {couch_db,get_db_info,1} , {couch_httpd_db,db_req,2} , {couch_httpd_db,do_db_req,2} , {couch_httpd,handle_request,5} , {mochiweb_http,headers,5} , {proc_lib,init_p_do_apply,3} ] A build with the patch can be found at http://github.com/juranki/couchdb/downloads .
          Hide
          Robert Newson added a comment -

          distilled from discussion on IRC;

          1) new database created as X.couch.0
          2) compaction starts, writing to X.compact
          3) compaction completes, rename X.compact to X.couch.1
          4) close fd on X.couch.0
          5) next write opens the database and looks for the X.couch.N for the highest N

          the code in all_databases will need to need to return only one item per X.couch.*, though.

          Show
          Robert Newson added a comment - distilled from discussion on IRC; 1) new database created as X.couch.0 2) compaction starts, writing to X.compact 3) compaction completes, rename X.compact to X.couch.1 4) close fd on X.couch.0 5) next write opens the database and looks for the X.couch.N for the highest N the code in all_databases will need to need to return only one item per X.couch.*, though.
          Hide
          Juhani Ränkimies added a comment -

          On windows file:rename(Source, Dest) fails if either Source or Dest is opened. I don't know if win32 api has something that would alleviate that.

          That means, that on window the rename operation must be of form
          file:close
          file:rename
          file:open

          couch_file is the right place for that, I think.

          The advantage of numbering the database files is that it requires one less synchronous close (for X.couch).

          Show
          Juhani Ränkimies added a comment - On windows file:rename(Source , Dest) fails if either Source or Dest is opened. I don't know if win32 api has something that would alleviate that. That means, that on window the rename operation must be of form file:close file:rename file:open couch_file is the right place for that, I think. The advantage of numbering the database files is that it requires one less synchronous close (for X.couch).
          Hide
          Juhani Ränkimies added a comment -

          http://github.com/juranki/couchdb/commit/db5a8737d42509ee943f676cb30c9a87639a2770 is a variation of the previous patch.

          It moves the platform specific handling to couch_file and supports the delete, rename pattern (if the caller has the pids of the couch_file processes).

          Show
          Juhani Ränkimies added a comment - http://github.com/juranki/couchdb/commit/db5a8737d42509ee943f676cb30c9a87639a2770 is a variation of the previous patch. It moves the platform specific handling to couch_file and supports the delete, rename pattern (if the caller has the pids of the couch_file processes).
          Hide
          Mark Hammond added a comment -

          I was chatting a little on IRC about this (are you ever there? If so, what is your nick?) It seems a complication is that couch reuses a single fd for all readers and writers. If we close the fd, how do we prevent existing readers (eg, a client reading an attachment, an existing _all_docs reader, etc) from dieing when the fd is closed?

          Show
          Mark Hammond added a comment - I was chatting a little on IRC about this (are you ever there? If so, what is your nick?) It seems a complication is that couch reuses a single fd for all readers and writers. If we close the fd, how do we prevent existing readers (eg, a client reading an attachment, an existing _all_docs reader, etc) from dieing when the fd is closed?
          Hide
          Juhani Ränkimies added a comment -

          Guess, I'll have to find an IRC client and get there

          Ok, I see. And on windows, the file can be closed/deleted only after all the readers are done.

          Show
          Juhani Ränkimies added a comment - Guess, I'll have to find an IRC client and get there Ok, I see. And on windows, the file can be closed/deleted only after all the readers are done.
          Hide
          Juhani Ränkimies added a comment -

          Did the numbering thing, sort of.
          On windows, marks the file for deletion and deletes when couch_file process dies.

          http://github.com/juranki/couchdb/commit/dddf8808eec89dd9bd8d546b8c5347d236004bfa

          Based on light testing, compaction works for dbs and views.

          Show
          Juhani Ränkimies added a comment - Did the numbering thing, sort of. On windows, marks the file for deletion and deletes when couch_file process dies. http://github.com/juranki/couchdb/commit/dddf8808eec89dd9bd8d546b8c5347d236004bfa Based on light testing, compaction works for dbs and views.
          Hide
          Mark Hammond added a comment -

          Thanks very much for running with this. I had a bit of a play over the weekend. First, a few comments about the patch itself:

          • The force_close_file changes in couch_file seem to be unused and should be deleted.
          • force_close_file in couch_db seems to be poorly named - it actually calls couch_file:delete and best I can tell, makes no attempt at closing the file at all.
          • couch_file:delete now seems to be mis-named - it only does a deletion on non-Windows platforms. I wonder if the logic for checking Windows shouldn't just go in couch_db, leaving couch_file with less magic and doing what people would expect it to do (ie, expect couch_file:delete to actually delete and may be surprised at the new behaviour.
          • Some of the functions named in couch_file should probably be renamed, even if they get longer names. eg, delete_all implies from the name if will delete all files from some location, when what it really does it delete all "versioned" files given a base-name. Ditto next_filepath - that problably wants a name closer to 'next_versioned_filepath'
          • I'm very inexperienced with erlang, but wouldn't it be better to use linked processes rather than using process_flag(trap_exit,true)?

          Also, the futon tests get further, but still fail for me. The couch log shows:

          1> [info] [<0.1606.0>] 127.0.0.1 - - 'GET' /test_suite_db/_changes?filter=changes_filter/conflicted
          200
          1>
          =INFO REPORT==== 8-Feb-2010::10:23:32 ===
          {failed_to_delete,["../var/lib/couchdb/test_suite_db.couch.0",

          {error,eacces}

          ]}
          1> [info] [<0.1609.0>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 200
          1> [info] [<0.1606.0>] 127.0.0.1 - - 'PUT' /test_suite_db/ 412
          1> [error] [<0.1622.0>] ** Generic server <0.1622.0> terminating

            • Last message in was {'EXIT',<0.1626.0>,shutdown}
            • When Server state == {file,{file_descriptor,prim_file,{#Port<0.4173>,712}},
              "../var/lib/couchdb/test_suite_db.couch.0",true,
              0}
            • Reason for termination ==
            • badmatch,{error,eacces,
              [ {couch_file,handle_info,2}

              ,

              {gen_server,handle_msg,5}

              ,

              {proc_lib,init_p_do_apply,3}

              ]}

          ...

          While futon reports # Exception raised:

          {"error":"file_exists","reason":"The database could not be created, the file already exists."}

          On the plus side, many tests now pass which failed before, and no stale 'versioned' files appear to have been left behind

          Show
          Mark Hammond added a comment - Thanks very much for running with this. I had a bit of a play over the weekend. First, a few comments about the patch itself: The force_close_file changes in couch_file seem to be unused and should be deleted. force_close_file in couch_db seems to be poorly named - it actually calls couch_ file:delete and best I can tell, makes no attempt at closing the file at all. couch_ file:delete now seems to be mis-named - it only does a deletion on non-Windows platforms. I wonder if the logic for checking Windows shouldn't just go in couch_db, leaving couch_file with less magic and doing what people would expect it to do (ie, expect couch_ file:delete to actually delete and may be surprised at the new behaviour. Some of the functions named in couch_file should probably be renamed, even if they get longer names. eg, delete_all implies from the name if will delete all files from some location, when what it really does it delete all "versioned" files given a base-name. Ditto next_filepath - that problably wants a name closer to 'next_versioned_filepath' I'm very inexperienced with erlang, but wouldn't it be better to use linked processes rather than using process_flag(trap_exit,true)? Also, the futon tests get further, but still fail for me. The couch log shows: 1> [info] [<0.1606.0>] 127.0.0.1 - - 'GET' /test_suite_db/_changes?filter=changes_filter/conflicted 200 1> =INFO REPORT==== 8-Feb-2010::10:23:32 === {failed_to_delete,["../var/lib/couchdb/test_suite_db.couch.0", {error,eacces} ]} 1> [info] [<0.1609.0>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 200 1> [info] [<0.1606.0>] 127.0.0.1 - - 'PUT' /test_suite_db/ 412 1> [error] [<0.1622.0>] ** Generic server <0.1622.0> terminating Last message in was {'EXIT',<0.1626.0>,shutdown} When Server state == {file,{file_descriptor,prim_file,{#Port<0.4173>,712}}, "../var/lib/couchdb/test_suite_db.couch.0",true, 0} Reason for termination == badmatch,{error,eacces , [ {couch_file,handle_info,2} , {gen_server,handle_msg,5} , {proc_lib,init_p_do_apply,3} ]} ... While futon reports # Exception raised: {"error":"file_exists","reason":"The database could not be created, the file already exists."} On the plus side, many tests now pass which failed before, and no stale 'versioned' files appear to have been left behind
          Hide
          Juhani Ränkimies added a comment -

          Sorry about the sloppy patch,

          Some of those are already addressed in the branch http://github.com/juranki/couchdb/commits/windows_compact_2

          • force_close_file is already cleaned
          • I agree that couch_file:delete is dangerous as is. I'll look into that.
          • I'll change delete_all -> delete_all_versions
          • I'll move the 'EXIT' handling to a separate process

          I haven't yet nailed what's with the changes test; changes api seems to work when playing with python client.

          There was also a bug with couch_file:delete_all error reporting; it caused the

          {"error":"file_exists","reason":"The database could not be created, the file already exists."}

          .

          Show
          Juhani Ränkimies added a comment - Sorry about the sloppy patch, Some of those are already addressed in the branch http://github.com/juranki/couchdb/commits/windows_compact_2 force_close_file is already cleaned I agree that couch_ file:delete is dangerous as is. I'll look into that. I'll change delete_all -> delete_all_versions I'll move the 'EXIT' handling to a separate process I haven't yet nailed what's with the changes test; changes api seems to work when playing with python client. There was also a bug with couch_ file:delete_all error reporting; it caused the {"error":"file_exists","reason":"The database could not be created, the file already exists."} .
          Hide
          Jan Lehnardt added a comment -

          The FILE_SHARE_DELTE flag will be part of the next Erlang version:

          http://github.com/erlang/otp/commit/c2448e5e1b703edf6a8d19f59ebfde7129fba612

          Show
          Jan Lehnardt added a comment - The FILE_SHARE_DELTE flag will be part of the next Erlang version: http://github.com/erlang/otp/commit/c2448e5e1b703edf6a8d19f59ebfde7129fba612
          Hide
          Mark Hammond added a comment -

          Attaching a patch which initially came from Damien and which I modified to also work with views. Once this patch is applied to a modified erlang interpreter (so FILE_SHARE_DELETE is used to open files) couch is able to delete and compact views and database.

          Show
          Mark Hammond added a comment - Attaching a patch which initially came from Damien and which I modified to also work with views. Once this patch is applied to a modified erlang interpreter (so FILE_SHARE_DELETE is used to open files) couch is able to delete and compact views and database.
          Hide
          Jan Lehnardt added a comment -

          Not sure why this was closed, the patch is not in trunk and it no longer applies. Why can work on an update?

          Show
          Jan Lehnardt added a comment - Not sure why this was closed, the patch is not in trunk and it no longer applies. Why can work on an update?
          Hide
          Randall Leeds added a comment -

          Damien committed COUCHDB-780, which uses the same approach as here.
          I don't think there's anything left to be done.
          Is there anything this does that's still missing?

          Maybe using a single deleted directory (like in this patch) is cleaner than leaving .delete files all over the data tree, though...

          Show
          Randall Leeds added a comment - Damien committed COUCHDB-780 , which uses the same approach as here. I don't think there's anything left to be done. Is there anything this does that's still missing? Maybe using a single deleted directory (like in this patch) is cleaner than leaving .delete files all over the data tree, though...
          Hide
          Damien Katz added a comment -

          Can anyone verify if the patch in COUCHDB-780 fixes the problem on windows with the latest Erlang?

          Show
          Damien Katz added a comment - Can anyone verify if the patch in COUCHDB-780 fixes the problem on windows with the latest Erlang?
          Hide
          Jan Lehnardt added a comment -

          Hooray for SVN / JIRA integration not working. I didn't see that 780 was committed and assumed this is the original ticket. Sorry about the mistake. We'd still need a verification that this solves all windows issues.

          Show
          Jan Lehnardt added a comment - Hooray for SVN / JIRA integration not working. I didn't see that 780 was committed and assumed this is the original ticket. Sorry about the mistake. We'd still need a verification that this solves all windows issues.

            People

            • Assignee:
              Paul Joseph Davis
              Reporter:
              Li Zhengji
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 5h
                5h
                Remaining:
                Remaining Estimate - 5h
                5h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development