CouchDB
  1. CouchDB
  2. COUCHDB-661

_all_dbs should list only the DBs accessible to the user

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.11
    • Fix Version/s: None
    • Component/s: HTTP Interface
    • Labels:
      None
    • Environment:

      trunk / 0.11

    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      As discussed in the auth roadmap mail, sent by Chris to @dev, the _all_dbs URI should only list the DBs that are accessible to the user.

      The following patch is a naive solution. It doesn't scale for CouchDB servers with millions of DBs. Regarding this scaling detail, I'll discuss soon in the @dev mailing list some ideas.

      1. couchdb-_all_dbs-auth.patch
        4 kB
        Filipe Manana
      2. couchdb-_all_dbs-auth-2.patch
        6 kB
        Filipe Manana

        Activity

        Hide
        Filipe Manana added a comment -

        Ops, forgot to include the adaptations necessary to test/etap/070-couch-db.t

        Show
        Filipe Manana added a comment - Ops, forgot to include the adaptations necessary to test/etap/070-couch-db.t
        Hide
        Filipe Manana added a comment -

        So, testing this with with a server having 1000 DBs, each one with a size of about 1Mb and 100 docs, the response time for _all_dbs is about 0.5s on my system (Ubuntu 9.10, SATA hd 7200rpm)

        The DB was populated with the tool at http://github.com/fdmanana/seatoncouch using the following doc template:

        {
        "_id": "doc#

        {doc_id_counter}

        ",
        "name": "#

        {random_string(100)}

        ",
        "address": "#

        {random_string(200)}

        ",
        "age": #

        {random_int(1, 100)}

        ,
        "children": #

        {random_int(0, 10)}

        ,
        "bio": "#

        {random_string(10000)}

        "
        }

        Each DB has the security doc:

        {
        "admins":

        { "names": ["joe"], "roles": ["test_admin", "superuser"] }

        ,
        "readers":

        { "names": ["fdmanana"], "roles": [] }

        }

        $ time ./seatoncouch.rb --doc-tpl fdmanana_doc.tpl --security-doc security_doc.json --dbs 1000 --docs 100
        [INFO] Created DB named testdb1
        [INFO] Created doc at /testdb1/doc1
        [INFO] Created doc at /testdb1/doc2

        1. etc... takes more than 1 hour

        Measuring the time:

        $ time curl http://localhost:5984/_all_dbs
        [
        "testdb2",
        "testdb485",
        "testdb497",

        1. etc...

        real 0m0.498s
        user 0m0.000s
        sys 0m0.010s

        Increasing the number of DBs to 7500:

        $ time ./seatoncouch.rb --doc-tpl fdmanana_doc.tpl --security-doc security_doc.json --dbs 6500 --docs 100 --db-start-id 1001
        ...

        $ time curl http://localhost:5984/_all_dbs 2> /dev/null | wc -l
        7502

        real 0m3.763s
        user 0m0.010s
        sys 0m0.090s

        $ time curl http://localhost:5984/_all_dbs 2> /dev/null | wc -l
        7502

        real 0m3.804s
        user 0m0.020s
        sys 0m0.060s

        $ time curl http://localhost:5984/_all_dbs 2> /dev/null | wc -l
        7502

        real 0m3.714s
        user 0m0.020s
        sys 0m0.100s

        Show
        Filipe Manana added a comment - So, testing this with with a server having 1000 DBs, each one with a size of about 1Mb and 100 docs, the response time for _all_dbs is about 0.5s on my system (Ubuntu 9.10, SATA hd 7200rpm) The DB was populated with the tool at http://github.com/fdmanana/seatoncouch using the following doc template: { "_id": "doc# {doc_id_counter} ", "name": "# {random_string(100)} ", "address": "# {random_string(200)} ", "age": # {random_int(1, 100)} , "children": # {random_int(0, 10)} , "bio": "# {random_string(10000)} " } Each DB has the security doc: { "admins": { "names": ["joe"], "roles": ["test_admin", "superuser"] } , "readers": { "names": ["fdmanana"], "roles": [] } } $ time ./seatoncouch.rb --doc-tpl fdmanana_doc.tpl --security-doc security_doc.json --dbs 1000 --docs 100 [INFO] Created DB named testdb1 [INFO] Created doc at /testdb1/doc1 [INFO] Created doc at /testdb1/doc2 etc... takes more than 1 hour Measuring the time: $ time curl http://localhost:5984/_all_dbs [ "testdb2", "testdb485", "testdb497", etc... real 0m0.498s user 0m0.000s sys 0m0.010s Increasing the number of DBs to 7500: $ time ./seatoncouch.rb --doc-tpl fdmanana_doc.tpl --security-doc security_doc.json --dbs 6500 --docs 100 --db-start-id 1001 ... $ time curl http://localhost:5984/_all_dbs 2> /dev/null | wc -l 7502 real 0m3.763s user 0m0.010s sys 0m0.090s $ time curl http://localhost:5984/_all_dbs 2> /dev/null | wc -l 7502 real 0m3.804s user 0m0.020s sys 0m0.060s $ time curl http://localhost:5984/_all_dbs 2> /dev/null | wc -l 7502 real 0m3.714s user 0m0.020s sys 0m0.100s
        Hide
        Brian Candler added a comment -

        Looks like a serious DoS to me, with "only" 7500 databases.

        If _all_dbs won't scale, then I think it should be for admins only (ideally with startkey/limit like _all_docs for efficient pagination, but that's a different issue)

        Or perhaps it should be possible to replace _all_dbs with a view in a 'real' database for non-admins.

        e.g. occasionally you could copy all the _security objects into another database, and generate a view with keys like
        emit(['name',name],db)
        emit(['role',role],db)
        for efficient querying.

        (IMHO this is another reason why _security objects should be real docs: so that you can follow a _changes feed on them)

        Show
        Brian Candler added a comment - Looks like a serious DoS to me, with "only" 7500 databases. If _all_dbs won't scale, then I think it should be for admins only (ideally with startkey/limit like _all_docs for efficient pagination, but that's a different issue) Or perhaps it should be possible to replace _all_dbs with a view in a 'real' database for non-admins. e.g. occasionally you could copy all the _security objects into another database, and generate a view with keys like emit( ['name',name] ,db) emit( ['role',role] ,db) for efficient querying. (IMHO this is another reason why _security objects should be real docs: so that you can follow a _changes feed on them)
        Hide
        Filipe Manana added a comment -

        Yes Brian, I do share the same vision.

        The idea, which I pretend to discuss soon in the @dev mailing list, is to use a view which will map user names and roles to lists of dbs. This view would exist in the design doc of a special db named "_dbs" (or whatever).

        I will send a mail asap to the dev mailing list presenting a partial solution and pointing out some technical issues, as well as to collect some feeback about them.

        cheers

        Show
        Filipe Manana added a comment - Yes Brian, I do share the same vision. The idea, which I pretend to discuss soon in the @dev mailing list, is to use a view which will map user names and roles to lists of dbs. This view would exist in the design doc of a special db named "_dbs" (or whatever). I will send a mail asap to the dev mailing list presenting a partial solution and pointing out some technical issues, as well as to collect some feeback about them. cheers
        Hide
        Jan Lehnardt added a comment -

        Bump to 1.3.x

        Show
        Jan Lehnardt added a comment - Bump to 1.3.x

          People

          • Assignee:
            Unassigned
            Reporter:
            Filipe Manana
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development