Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5177

Query Planning takes infinite time in case drill is connected to Mongo Sharded environment

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.0
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      1) There were 4 drillbits (with 3 zookeeper) and 4 mongod’s in the cluster. However the drillbits and mongod’s were not located on the same physical server.
      2) The shard key was evenly distributed among 4 shards (mongod)

      Description

      When Drill is connected to Sharded Mongo environment (mongoS), then the query execution time is very high as compared to query execution time on mongod (even though the volume of data on mongoS and mongoD is almost same). The root cause behind the same can be linked with the query planning time.

      On MongoS
      Collection Size : - 200 GB, Record Count : 230,083,160
      A simple select query with a filter on indexed column was executed, but then the query was under execution for more than 50 minutes. The query state was "STARTING" until 40 minutes. Upon further analysis, it was revealed that query planning took very long.

      Below are the details where this issue was localised -

      Class Name : DefaultSqlHandler.java
      Method Name : protected RelNode transform(PlannerType plannerType, PlannerPhase phase, RelNode input, RelTraitSet targetTraits,
      boolean log)
      Line No : 384 : output = program.run(planner, input, toTraits);

      The output from the above line is returned by VolcanoPlanner class
      (package: org.apache.calcite.plan.volcano) which takes huge time for query planning. This is only in case of MongoS environment.

      When the same select query was executed on MongoD environment
      CollectionSize: 306 GB Record Count : 49,924,351
      Query execution was completed within 2 minutes and above line returned the output within seconds.

      Given that the data volume was high (300 GB) on mongoD as compared to MongoS(200GB), but the query planning was much faster on MongoD. There seems to be some issue with query planning for MongoS environment.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mridulchopra Mridul Chopra
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: