Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4852

COUNT(*) query against a large JSON table slower by 2x

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.8.0
    • 1.8.0
    • Execution - Flow
    • None
    • 4 node cluster CentOS

    Description

      We have this manual test where it does a COUNT over 26 million JSON keys. From the results it looks like we have regressed and are slower by 2x on current 1.8.0 master 1.8.0-SNAPSHOT git commit ID : 57dc9f43
      Query takes over 30 seconds to execute consistently over several runs. Note that since this is a single large JSON file there is just one fragment doing all the work.

      0: jdbc:drill:schema=dfs.tmp> select count(*) from `twoKeyJsn.json`;
      +-----------+
      |  EXPR$0   |
      +-----------+
      | 26212355  |
      +-----------+
      1 row selected (29.001 seconds)
      

      On Drill 1.2.0 the above query took 13.949 seconds

      Attachments

        Issue Links

          Activity

            People

              arina Arina Ielchiieva
              khfaraaz Khurram Faraaz
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: