Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-164

Improve query performance of optiq over mongo

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      optiq-mongo's aggregation framework pipeline could be optimized.

      Queries like "select <projectItems> from < tableExpression> where < booleanExpression>" should let "where <booleanExpression>" (i.e. {$match:{...}}) go in front of "select <projectItems>" (i.e. {$project:{...}}). This could allow indexes to be used so that mongo doesn't need to scan the whole collection, which will very likely improve query speed.

      optiq command:
      ```
      select * from visits where user_id='test_user_id' limit 10;
      ```

      current mongo command:
      ```
      db.visits.runCommand("aggregate",{
      pipeline: [
      { $project:

      { date: 1.0, user_id: 1.0, 'dim_user': 1.0, 'dim_tech': 1.0 }

      },
      { $match:

      { user_id: "test_user_id" }

      },
      ],
      explain:true}
      );
      ```
      result:
      ```
      "allPlans" : [
      {
      "cursor" : "BasicCursor",
      "n" : 1989102,
      "nscannedObjects" : 1989102,
      "nscanned" : 1989102,
      "indexBounds" : {

      }
      }
      ]
      ```

      proposed mongo command:
      ```
      db.visits.runCommand("aggregate",{
      pipeline: [
      { $match:

      { user_id: "test_user_id" }

      },
      { $project:

      { date: 1.0, user_id: 1.0, 'dim_user': 1.0, 'dim_tech': 1.0 }

      }
      ],
      explain:true}
      );
      ```
      result:
      ```
      "allPlans" : [
      {
      "cursor" : "BtreeCursor user_id_1_date_1",
      "n" : 394,
      "nscannedObjects" : 394,
      "nscanned" : 394,
      "indexBounds" : {
      "user_id" : [
      [
      "test_user_id",
      "test_user_id"
      ]
      ],
      "date" : [
      [

      { "$minElement" : 1 }

      ,

      { "$maxElement" : 1 }

      ]
      ]
      }
      }
      ]
      ```

      Btw, the "limit 10" doesn't work for me. It doesn't add a {$limit:10} pipeline.

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/julianhyde/optiq/issues/164
      Created by: tikazyq
      Labels: performance,
      Assignee: julianhyde
      Created at: Sun Mar 02 17:10:31 CET 2014
      State: closed

        Activity

        Hide
        github-import GitHub Import added a comment -

        [Date: Sun Mar 02 17:26:04 CET 2014, Author: julianhyde]

        Great bug. I'm not a mongo expert so I don't know what an efficient mongo query looks like. Please look log more bugs if you find other things that could be improved.

        Show
        github-import GitHub Import added a comment - [Date: Sun Mar 02 17:26:04 CET 2014, Author: julianhyde ] Great bug. I'm not a mongo expert so I don't know what an efficient mongo query looks like. Please look log more bugs if you find other things that could be improved.
        Hide
        github-import GitHub Import added a comment -

        [Date: Sun Mar 02 17:42:49 CET 2014, Author: tikazyq]

        There are well written documentation about SQL-to-Mongo Mapping as well as aggregation.
        http://docs.mongodb.org/manual/reference/sql-comparison/
        http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/

        It'll be great to see them implemented in optiq.

        Thanks for the brilliant work, Julian.

        Show
        github-import GitHub Import added a comment - [Date: Sun Mar 02 17:42:49 CET 2014, Author: tikazyq ] There are well written documentation about SQL-to-Mongo Mapping as well as aggregation. http://docs.mongodb.org/manual/reference/sql-comparison/ http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/ It'll be great to see them implemented in optiq. Thanks for the brilliant work, Julian.

          People

          • Assignee:
            Unassigned
            Reporter:
            github-import GitHub Import
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development