Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.11
    • Fix Version/s: None
    • Component/s: Database Core
    • Labels:
      None
    • Environment:

      debian 5.0, amd64, couchdb from git.apache.org at dd15c8ed5bf5873aec08a99a0687849f1d29f4c3

    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      I'm running repeated tests where I create multiple databases on multiple machines and then compact them all in parallel.

      This almost always works but sometimes does not. Specifically, I have one machine where each .compact file is about 50% complete (I know this because I see successfully compacted versions of the same data on the other machines in my set) and this in the log;

      [Mon, 04 Jan 2010 19:50:23 GMT] [error] [<0.17793.28>] Uncaught error in HTTP request:

      {exit,noproc}

      [Mon, 04 Jan 2010 19:50:26 GMT] [error] [<0.17801.28>] Uncaught error in HTTP request: {exit,noproc}

      [Mon, 04 Jan 2010 19:50:26 GMT] [error] [<0.17753.28>] Uncaught error in HTTP request:

      {exit,noproc}

      I'm not sure it's related to the compact process crash but my http client also received an error while polling _active_tasks for compaction to complete;

      Waiting for compaction to complete.
      Exception in thread "main" org.apache.http.conn.HttpHostConnectException: Connection to http://machine_name:5984 refused

        Activity

        Hide
        Randall Leeds added a comment -

        Is this fixed with 1.1.1?

        Show
        Randall Leeds added a comment - Is this fixed with 1.1.1?
        Paul Joseph Davis made changes -
        Field Original Value New Value
        Skill Level Regular Contributors Level (Easy to Medium)
        Hide
        Robert Newson added a comment -

        My bad, it was just one of the three boxes that was left in this state. The other two completed fine. They're identical hardware, OS, and couchdb build and configuration.

        Show
        Robert Newson added a comment - My bad, it was just one of the three boxes that was left in this state. The other two completed fine. They're identical hardware, OS, and couchdb build and configuration.
        Hide
        Robert Newson added a comment -

        Hi,

        Sorry, I don't currently have any more information than this. I wanted to report it while I still had what little information I had on screen. CouchDB is responsive after this event without restarting it but since this happens in my automated overnight tests I can't say how immediately it recovers.

        All I know is that I was left with an incomplete .compact file for every single database (a dozen or so) on each of the three servers I tested this on and no active tasks on any of them. I figured that was pretty extraordinary. I agree that it's not likely that the client issuing the _compact calls or the one calling _active_tasks is the cause of the crash.

        Sorry I don't have more information at this time, I'll be on IRC after I conduct an interview today and perhaps we can chat about this?

        Show
        Robert Newson added a comment - Hi, Sorry, I don't currently have any more information than this. I wanted to report it while I still had what little information I had on screen. CouchDB is responsive after this event without restarting it but since this happens in my automated overnight tests I can't say how immediately it recovers. All I know is that I was left with an incomplete .compact file for every single database (a dozen or so) on each of the three servers I tested this on and no active tasks on any of them. I figured that was pretty extraordinary. I agree that it's not likely that the client issuing the _compact calls or the one calling _active_tasks is the cause of the crash. Sorry I don't have more information at this time, I'll be on IRC after I conduct an interview today and perhaps we can chat about this?
        Hide
        Adam Kocoloski added a comment -

        Hi Bob, I think we'll need a bit more to figure out exactly what crashed here. Is it just _active_tasks that's inaccessible when this happens, or did the whole couch_httpd process shut down? Even if the couch_task_status server died (which would cause the noproc errors when you query _active_tasks), that by itself should not kill the compactor process, since it only uses fire-n-forget gen_server:cast calls to update its task status.

        Show
        Adam Kocoloski added a comment - Hi Bob, I think we'll need a bit more to figure out exactly what crashed here. Is it just _active_tasks that's inaccessible when this happens, or did the whole couch_httpd process shut down? Even if the couch_task_status server died (which would cause the noproc errors when you query _active_tasks), that by itself should not kill the compactor process, since it only uses fire-n-forget gen_server:cast calls to update its task status.
        Robert Newson created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Newson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development