CouchDB
  1. CouchDB
  2. COUCHDB-1228

Key range error apparently ignores view collation

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1
    • Fix Version/s: 1.2
    • Component/s: HTTP Interface
    • Labels:
      None
    • Environment:

      Debian

      Description

      I have created a view (no reduce function) with "options":

      {"collation":"raw"}

      and emit documents with keys "A", "C" and "b". Running a request on that view with no parameters, I get as expected all three documents in order "A", "C" and "b" as required by the raw collation (instead of "A", "b", "C" for the default ICU collation).

      However, when I run a request with start key "B" and end key "a", I expect the "C" document to be returned alone (as "B" < "C" < "a") but couchDB responds:

      { "error": "query_parse_error", "reason": "No rows can match your key range, reverse your start_key and end_key or set descending=true" }

      This error would make sense if I had been using the default ICU collation, where "B" > "a", but with the raw collation the reverse ("B" > "a") is true. It looks as if the key order warning does not take the view collation into account.

        Activity

        Hide
        Jason Smith added a comment -

        This bug also prevents a query for all user documents in the _users database.

        $ curl -i 'localhost:5984/_users/_all_docs?startkey="org.couchdb.user%3a"&endkey="org.couchdb.user%3b"'
        HTTP/1.1 400 Bad Request
        Server: CouchDB/1.1.0 (Erlang OTP/R14B02)
        Date: Sat, 06 Aug 2011 02:03:00 GMT
        Content-Type: text/plain;charset=utf-8
        Content-Length: 133
        Cache-Control: must-revalidate

        {"error":"query_parse_error","reason":"No rows can match your key range, reverse your start_key and end_key or set descending=true"}
        Show
        Jason Smith added a comment - This bug also prevents a query for all user documents in the _users database. $ curl -i 'localhost:5984/_users/_all_docs?startkey="org.couchdb.user%3a"&endkey="org.couchdb.user%3b"' HTTP/1.1 400 Bad Request Server: CouchDB/1.1.0 (Erlang OTP/R14B02) Date: Sat, 06 Aug 2011 02:03:00 GMT Content-Type: text/plain;charset=utf-8 Content-Length: 133 Cache-Control: must-revalidate {"error":"query_parse_error","reason":"No rows can match your key range, reverse your start_key and end_key or set descending=true"}
        Hide
        Bob Dionne added a comment -

        Looks like the problem is in couch_httpd_view:warn_on_empty_key_range, it makes use of couch_view:less_json as the default instead of taking the collation option into account.

        Shouldn't be too hard to fix in the view case but I'm not sure how to handle the case of _all_docs

        Show
        Bob Dionne added a comment - Looks like the problem is in couch_httpd_view:warn_on_empty_key_range, it makes use of couch_view:less_json as the default instead of taking the collation option into account. Shouldn't be too hard to fix in the view case but I'm not sure how to handle the case of _all_docs
        Hide
        Paul Joseph Davis added a comment -

        @Bob

        Quite right. Though that code will have to poke into the design doc to handle that warning case. I'm not sure what _all_docs has to do with it. It only uses raw collation.

        Show
        Paul Joseph Davis added a comment - @Bob Quite right. Though that code will have to poke into the design doc to handle that warning case. I'm not sure what _all_docs has to do with it. It only uses raw collation.
        Hide
        Jason Smith added a comment -

        Prep for real work

        Show
        Jason Smith added a comment - Prep for real work
        Hide
        Jason Smith added a comment -

        Proposed patch with unit tests

        Show
        Jason Smith added a comment - Proposed patch with unit tests
        Hide
        Jason Smith added a comment -

        Original patches are against 1.1.x. Next patch set is against trunk.

        Show
        Jason Smith added a comment - Original patches are against 1.1.x. Next patch set is against trunk.
        Hide
        Paul Joseph Davis added a comment -

        @Jason,

        Looks pretty good. Though I'd implement collation_for_view slightly differently:

        collationv_for_view(

        {reduce, _N, _Lang, View}

        ) ->
        collation_for_view(View);
        collation_for_view(View) ->
        fun(A, B) ->
        couch_btree:less(View#view.btree, A, B)
        end.

        You'll need to export couch_btree:less, but that's no biggy. I think this should simplify the rest of the collation logic further down. Only tricky thing is for _all_docs to just remember and pass fun(A, B) -> A < B end. instead of raw as the Collation thing.

        Show
        Paul Joseph Davis added a comment - @Jason, Looks pretty good. Though I'd implement collation_for_view slightly differently: collationv_for_view( {reduce, _N, _Lang, View} ) -> collation_for_view(View); collation_for_view(View) -> fun(A, B) -> couch_btree:less(View#view.btree, A, B) end. You'll need to export couch_btree:less, but that's no biggy. I think this should simplify the rest of the collation logic further down. Only tricky thing is for _all_docs to just remember and pass fun(A, B) -> A < B end. instead of raw as the Collation thing.
        Hide
        Jason Smith added a comment -

        Thanks, Paul. I will submit another patch set.

        I was not totally confident about the

        {reduce, _N, _Lang, View}

        stuff. It comes from code like this:

        https://github.com/apache/couchdb/blob/trunk/src/couchdb/couch_httpd_view.erl#L35-50

        Would you mind briefly explaining what that code is doing? The `ReduceView` is that

        {reduce, N, Lang, View}

        tuple, but that is not a record from couch_db.hrl; it is simply built on the fly

        https://github.com/apache/couchdb/blob/trunk/src/couchdb/couch_view.erl#L144-148

        Thanks!

        Show
        Jason Smith added a comment - Thanks, Paul. I will submit another patch set. I was not totally confident about the {reduce, _N, _Lang, View} stuff. It comes from code like this: https://github.com/apache/couchdb/blob/trunk/src/couchdb/couch_httpd_view.erl#L35-50 Would you mind briefly explaining what that code is doing? The `ReduceView` is that {reduce, N, Lang, View} tuple, but that is not a record from couch_db.hrl; it is simply built on the fly https://github.com/apache/couchdb/blob/trunk/src/couchdb/couch_view.erl#L144-148 Thanks!
        Hide
        Paul Joseph Davis added a comment -

        The unintuitive part here is the optimizations we make when map functions are byte identical. Ie, if you have the same map function and a number of different reduce functions, then we only write one btree to disk for that map function. Its basically a short comming in how we define map/reduce function pairs.

        But the bottom line is that even if you have say three pairs of m/r functions, but the m function is identical in all three, then all three are represented as a single #view{} record. Its basically {MapFun, [

        {ReduceName, ReduceFun}

        | RestRedFuns], OtherMetaStuff}.

        The

        {reduce, N, Lang, View}

        tuple is just wrapped around a #view{} record to indicate which reduce function is being referred to. Of course this could've been defined as -record(reduce,

        {nth, language, view}

        ). but wasn't for some reason long ago. Bottom line is that all of those reduce functions and the underlying map function have an identical sort ordering, so all you're doing is extracting the view to get at that info.

        (Side note, when I said byte-identical map function, they also have to have identical view options values as well)

        Show
        Paul Joseph Davis added a comment - The unintuitive part here is the optimizations we make when map functions are byte identical. Ie, if you have the same map function and a number of different reduce functions, then we only write one btree to disk for that map function. Its basically a short comming in how we define map/reduce function pairs. But the bottom line is that even if you have say three pairs of m/r functions, but the m function is identical in all three, then all three are represented as a single #view{} record. Its basically {MapFun, [ {ReduceName, ReduceFun} | RestRedFuns], OtherMetaStuff}. The {reduce, N, Lang, View} tuple is just wrapped around a #view{} record to indicate which reduce function is being referred to. Of course this could've been defined as -record(reduce, {nth, language, view} ). but wasn't for some reason long ago. Bottom line is that all of those reduce functions and the underlying map function have an identical sort ordering, so all you're doing is extracting the view to get at that info. (Side note, when I said byte-identical map function, they also have to have identical view options values as well)
        Hide
        Jason Smith added a comment -

        Second patch set ("B" series). Developed against 1.1.x, rebased to trunk for your convenience but untested since I didn't bother rebuilding libjs.

        Now, when checking view ranges, an is-less-than predicate (callback) is required.

        For normal views, the callback uses couch_btree:less directly from the view object, which presumably knows the correct collation algorithm.

        For temporary views, I made a minor change to first instantiate the view objects, and then check for range validity, again using the view's #btree.less function. I confirmed that this works for normal and raw collation in temporary views, however I could not find an obvious place in the test suite to add the unit tests.

        For _all_docs queries, the http handler simply hard-codes a raw collation function and passes that to the range checker.

        Show
        Jason Smith added a comment - Second patch set ("B" series). Developed against 1.1.x, rebased to trunk for your convenience but untested since I didn't bother rebuilding libjs. Now, when checking view ranges, an is-less-than predicate (callback) is required. For normal views, the callback uses couch_btree:less directly from the view object, which presumably knows the correct collation algorithm. For temporary views, I made a minor change to first instantiate the view objects, and then check for range validity, again using the view's #btree.less function. I confirmed that this works for normal and raw collation in temporary views, however I could not find an obvious place in the test suite to add the unit tests. For _all_docs queries, the http handler simply hard-codes a raw collation function and passes that to the range checker.
        Hide
        Paul Joseph Davis added a comment -

        Fixed as of r1156509

        Show
        Paul Joseph Davis added a comment - Fixed as of r1156509

          People

          • Assignee:
            Unassigned
            Reporter:
            Victor Nicollet
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development