Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4852

COUNT(*) query against a large JSON table slower by 2x

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.8.0
    • Fix Version/s: 1.8.0
    • Component/s: Execution - Flow
    • Labels:
      None
    • Environment:

      4 node cluster CentOS

      Description

      We have this manual test where it does a COUNT over 26 million JSON keys. From the results it looks like we have regressed and are slower by 2x on current 1.8.0 master 1.8.0-SNAPSHOT git commit ID : 57dc9f43
      Query takes over 30 seconds to execute consistently over several runs. Note that since this is a single large JSON file there is just one fragment doing all the work.

      0: jdbc:drill:schema=dfs.tmp> select count(*) from `twoKeyJsn.json`;
      +-----------+
      |  EXPR$0   |
      +-----------+
      | 26212355  |
      +-----------+
      1 row selected (29.001 seconds)
      

      On Drill 1.2.0 the above query took 13.949 seconds

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                arina Arina Ielchiieva
                Reporter:
                khfaraaz Khurram Faraaz
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: