Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-3358

Change O(n^2) function to be more performant

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Mango
    • Labels:
      None

      Description

      This is related to this https://issues.apache.org/jira/browse/COUCHDB-2951. When a user has a document with lots of field names, or nested fields with arrays, we add these fields to a special $fieldnames field. However, as we add them , we're calling lists:member on that same Acc, making it a O(n^2) operation.

        Activity

        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 21cb0eb98bbf592a24456637429e5df6ce99ffc6 in couchdb's branch refs/heads/3358-use-efficient-set from Tony Sun
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=21cb0eb ]

        Use efficient set storage for field names

        When indexing a set of fields for text search, we also create a special
        field called $fieldnames. It contains values for all the fields that
        need to be indexed. In order to do that, we need a unique list of the
        form [[<<"$fieldnames">>, Name, [] | Rest]. The old code would add an
        element to the list, and then check for membership via lists:member/2.
        This is inefficient. Some documents can contain a large number of
        fields, so we will use gb_sets to create a unique set of fields, and
        then extract out the field names.

        COUCHDB-3358

        Show
        jira-bot ASF subversion and git services added a comment - Commit 21cb0eb98bbf592a24456637429e5df6ce99ffc6 in couchdb's branch refs/heads/3358-use-efficient-set from Tony Sun [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=21cb0eb ] Use efficient set storage for field names When indexing a set of fields for text search, we also create a special field called $fieldnames. It contains values for all the fields that need to be indexed. In order to do that, we need a unique list of the form [[<<"$fieldnames">>, Name, [] | Rest]. The old code would add an element to the list, and then check for membership via lists:member/2. This is inefficient. Some documents can contain a large number of fields, so we will use gb_sets to create a unique set of fields, and then extract out the field names. COUCHDB-3358
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 13f6bc370e23863c6ab228f68ede91fba452ee0f in couchdb's branch refs/heads/3358-use-efficient-set from Tony Sun
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=13f6bc3 ]

        ignore type and value

        duplicates could occur when just using gb_set:from_list

        COUCHDB-3358

        Show
        jira-bot ASF subversion and git services added a comment - Commit 13f6bc370e23863c6ab228f68ede91fba452ee0f in couchdb's branch refs/heads/3358-use-efficient-set from Tony Sun [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=13f6bc3 ] ignore type and value duplicates could occur when just using gb_set:from_list COUCHDB-3358
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4e5e84ecc20808df056163f6a147c154cdf75ee3 in couchdb's branch refs/heads/3358-use-efficient-set from Tony Sun
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4e5e84e ]

        Use efficient set storage for field names

        When indexing a set of fields for text search, we also create a special
        field called $fieldnames. It contains values for all the fields that
        need to be indexed. In order to do that, we need a unique list of the
        form [[<<"$fieldnames">>, Name, [] | Rest]. The old code would add an
        element to the list, and then check for membership via lists:member/2.
        This is inefficient. Some documents can contain a large number of
        fields, so we will use gb_sets to create a unique set of fields, and
        then extract out the field names.

        COUCHDB-3358

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4e5e84ecc20808df056163f6a147c154cdf75ee3 in couchdb's branch refs/heads/3358-use-efficient-set from Tony Sun [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4e5e84e ] Use efficient set storage for field names When indexing a set of fields for text search, we also create a special field called $fieldnames. It contains values for all the fields that need to be indexed. In order to do that, we need a unique list of the form [[<<"$fieldnames">>, Name, [] | Rest]. The old code would add an element to the list, and then check for membership via lists:member/2. This is inefficient. Some documents can contain a large number of fields, so we will use gb_sets to create a unique set of fields, and then extract out the field names. COUCHDB-3358
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4e5e84ecc20808df056163f6a147c154cdf75ee3 in couchdb's branch refs/heads/master from Tony Sun
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4e5e84e ]

        Use efficient set storage for field names

        When indexing a set of fields for text search, we also create a special
        field called $fieldnames. It contains values for all the fields that
        need to be indexed. In order to do that, we need a unique list of the
        form [[<<"$fieldnames">>, Name, [] | Rest]. The old code would add an
        element to the list, and then check for membership via lists:member/2.
        This is inefficient. Some documents can contain a large number of
        fields, so we will use gb_sets to create a unique set of fields, and
        then extract out the field names.

        COUCHDB-3358

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4e5e84ecc20808df056163f6a147c154cdf75ee3 in couchdb's branch refs/heads/master from Tony Sun [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4e5e84e ] Use efficient set storage for field names When indexing a set of fields for text search, we also create a special field called $fieldnames. It contains values for all the fields that need to be indexed. In order to do that, we need a unique list of the form [[<<"$fieldnames">>, Name, [] | Rest]. The old code would add an element to the list, and then check for membership via lists:member/2. This is inefficient. Some documents can contain a large number of fields, so we will use gb_sets to create a unique set of fields, and then extract out the field names. COUCHDB-3358

          People

          • Assignee:
            tonysun83 Tony Sun
            Reporter:
            tonysun83 Tony Sun
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development