[DRILL-4852] COUNT(*) query against a large JSON table slower by 2x - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.8.0
Fix Version/s: 1.8.0
Component/s: Execution - Flow
Labels:
None
Environment:

4 node cluster CentOS

Description

We have this manual test where it does a COUNT over 26 million JSON keys. From the results it looks like we have regressed and are slower by 2x on current 1.8.0 master 1.8.0-SNAPSHOT git commit ID : 57dc9f43
Query takes over 30 seconds to execute consistently over several runs. Note that since this is a single large JSON file there is just one fragment doing all the work.

0: jdbc:drill:schema=dfs.tmp> select count(*) from `twoKeyJsn.json`;
+-----------+
|  EXPR$0   |
+-----------+
| 26212355  |
+-----------+
1 row selected (29.001 seconds)

On Drill 1.2.0 the above query took 13.949 seconds

Attachments

Issue Links

links to

GitHub Pull Request #576

Activity

People

Assignee:: Arina Ielchiieva

Reporter:: Khurram Faraaz

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 18/Aug/16 13:22

Updated:: 25/Aug/16 09:25

Resolved:: 25/Aug/16 09:08