Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7719

Query Execution plan don't use MongoDB Index.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17.0
    • None
    • None

    Description

      The query plan was showing a collection scan, going over all the rows in the mongo collection. Hence created an index on the predicate cols, expecting Drill to choose and index based access plan. But continues to use the full table scan path.

      The actual query, generated query plan and the mongo index are given below.

      SQL:
      Select j.user as User, TO_DATE(j.created_at) as submitted_on
      from mongo.example.jobs j
      where j.user = 'john@example.ai' and j.created_at BETWEEN timestamp '2020-03-25 13:12:55' AND timestamp '2020-04-24 13:12:55'

      Physical Plan
      00-00 Screen : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost = {6720.59875 rows, 23532.19875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10468
      00-01 Project(User=[$0], submitted_on=[TO_DATE($1)]) : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost = {6708.475 rows, 23520.075 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10467
      00-02 SelectionVectorRemover : rowType = RecordType(ANY user, ANY created_at): rowcount = 121.2375, cumulative cost = {6587.2375 rows, 22913.8875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10466
      00-03 Filter(condition=[AND(=($0, 'john@example.ai'), >=($1, 2020-03-25 13:12:55), <=($1, 2020-04-24 13:12:55))]) : rowType = RecordType(ANY user, ANY created_at): rowcount = 121.2375, cumulative cost = {6466.0 rows, 22792.65 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10465
      00-04 Scan(table=[[mongo, example, jobs]], groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=example, collectionName=jobs, filters=null], columns=[`user`, `created_at`]]]) : rowType = RecordType(ANY user, ANY created_at): rowcount = 3233.0, cumulative cost = {3233.0 rows, 6466.0 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10464

      Index created in MongoDB
      {
      "v" : 2,
      "key" :

      { "user" : 1, "created_at" : 1, "method_map_id" : 1 }

      ,
      "name" : "user_1_created_at_1_method_map_id_1",
      "ns" : "example.jobs"
      }

       

      In a drill documentation, I see that drill supports indexes only for MapR DB. Does that mean Indexes of other data sources like mongo won't be used ?

      https://drill.apache.org/docs/querying-indexes-introduction/

      Kindly take a look at this issue.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            orcascope Arshadh
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: