CouchDB
  1. CouchDB
  2. COUCHDB-890

Option to use a persistent CommonJS module cache

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: JavaScript View Server
    • Labels:
      None
    • Environment:

      All

    • Skill Level:
      New Contributors Level (Easy)

      Description

      Since COUCHDB-1075, there is a CommonJS module cache used for resolving circular CommonJS dependencies. However, Caolan reports a 10x speed improvement by not clearing this cache between requests. An option to not clear the cache could be a great tool for performance-interested power users who know their CommonJS modules are sane. The improvement will be even greater when we turn on the JIT compiler in SpiderMonkey since cached modules will benefit from being pre-JIT'd.

        Activity

        Hide
        Randall Leeds added a comment -

        Osher, the best way to get real-time support is to find developers in IRC in the #couchdb room on Freenode.
        The wiki has build documentation.
        The main.js script the server runs is in the directory share/server. Those files are all concatenated at build time (see Makefile.am in that directory) and output to the file main.js.
        To debug you can try to run the file directly with the couchjs program and pretend to be the CouchDB side of the View Server Protocol, which you can also find on the wiki.

        Show
        Randall Leeds added a comment - Osher, the best way to get real-time support is to find developers in IRC in the #couchdb room on Freenode. The wiki has build documentation. The main.js script the server runs is in the directory share/server. Those files are all concatenated at build time (see Makefile.am in that directory) and output to the file main.js. To debug you can try to run the file directly with the couchjs program and pretend to be the CouchDB side of the View Server Protocol, which you can also find on the wiki.
        Hide
        Osher E added a comment - - edited

        I do not know what the appointed patch means, but I'm in the believe that a simple solution can be made:

        by wrapping the 'native' require function with a requirer of our own, we can keep alive the execution-context of a certain view, where this context would be used whenever new documents need to be indexed to the view. it means that a design-document with several views will have such execution context per view,
        This execution context will hold the view-level instances to the evaluated modules. This execution context is then coupled with the view-results it was used to generate.

        From the other side - a reverse index is held, which references the view execution-contexts for all the views it is involved in.
        Once the design-document is updated, every module that fails a check-sum check (i.e changed) will invalidate all the execution-context of the views that it is part of, hence will trigger re-indexing of the view.

        It would still mean that update of a low-level module might require re-indexing of all the views.
        However, new documents and updates that come by re-use the same already evaluated execution contexts.

        thoughts?

        P.S:
        I would love to help and offer pull-requests if someone would help me get started: how to my own build? how is the server.js is generated? how to debug this? how to edit such sources? what are the conventions that are followed? I'm GMT+2.

        Show
        Osher E added a comment - - edited I do not know what the appointed patch means, but I'm in the believe that a simple solution can be made: by wrapping the 'native' require function with a requirer of our own, we can keep alive the execution-context of a certain view, where this context would be used whenever new documents need to be indexed to the view. it means that a design-document with several views will have such execution context per view, This execution context will hold the view-level instances to the evaluated modules. This execution context is then coupled with the view-results it was used to generate. From the other side - a reverse index is held, which references the view execution-contexts for all the views it is involved in. Once the design-document is updated, every module that fails a check-sum check (i.e changed) will invalidate all the execution-context of the views that it is part of, hence will trigger re-indexing of the view. It would still mean that update of a low-level module might require re-indexing of all the views. However, new documents and updates that come by re-use the same already evaluated execution contexts. thoughts? P.S: I would love to help and offer pull-requests if someone would help me get started: how to my own build? how is the server.js is generated? how to debug this? how to edit such sources? what are the conventions that are followed? I'm GMT+2.
        Hide
        Paul Joseph Davis added a comment -

        While true, the awesomely scary aspect is code loaded into the SM VM when code is run by different users/dbs.

        Show
        Paul Joseph Davis added a comment - While true, the awesomely scary aspect is code loaded into the SM VM when code is run by different users/dbs.
        Hide
        Randall Leeds added a comment -

        Updated in light of COUCHDB-1075 work.

        Show
        Randall Leeds added a comment - Updated in light of COUCHDB-1075 work.
        Hide
        Caolan McMahon added a comment -

        According to my tests this module caching patch doesn't actually work... the compiled function isn't available on the next resolveModule call and performance remains unchanged after the patch (when I've seen big speed improvements with the other implementation).

        Show
        Caolan McMahon added a comment - According to my tests this module caching patch doesn't actually work... the compiled function isn't available on the next resolveModule call and performance remains unchanged after the patch (when I've seen big speed improvements with the other implementation).
        Hide
        Caolan McMahon added a comment -

        Having a module cache that can persist state between requests might not be a good idea since it could be unreliable with multiple js processes or break the caching rules. How would you know if a resource has changed if it also depends on the state of commonjs modules?

        See: https://issues.apache.org/jira/browse/COUCHDB-1075 for a patch that implements a module cache which is cleared between requests. You don't get the same performance benefits, but you do get increased compatibility with modules which store some state on the module (such as template libraries), and it also fixes circular requires.

        Show
        Caolan McMahon added a comment - Having a module cache that can persist state between requests might not be a good idea since it could be unreliable with multiple js processes or break the caching rules. How would you know if a resource has changed if it also depends on the state of commonjs modules? See: https://issues.apache.org/jira/browse/COUCHDB-1075 for a patch that implements a module cache which is cleared between requests. You don't get the same performance benefits, but you do get increased compatibility with modules which store some state on the module (such as template libraries), and it also fixes circular requires.

          People

          • Assignee:
            Unassigned
            Reporter:
            mikeal
          • Votes:
            5 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development