[IMPALA-3869] Perfromance down in KUDU as compare to HDFS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Not A Bug
Affects Version/s: Kudu_Impala
Fix Version/s: None
Component/s: Backend
Labels:
None

Target Version:

Kudu_Impala

Description

I am performing testing scenarios between IMPALA on HDFS vs IMPALA on KUDU

we have set of queries which are accessing number of fact tables and dimension tables.

In one of the query we are trying to process 2 fact tables which are having around 78 millions and 668 millions records.

While having data in IMPALA on HDFS, i was able to get query results in less than 50 seconds.

But while having data in IMPALA on KUDU, even after trying number of distributions/paritions, i have not been able to reduce query execution time less than 125 seconds.

So i have some conerns here...
1. In KUDU, what is the criteria of having number of cores/nodes in cluster as per number of records to process...?
2. In KUDU, is there any option of like distributed cache in IMPALA on KUDU to improve my execution time...?
3. Is there any other way to improve performance with having such huge data load..?

i have attached the query for reference..

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

query3.txt
18/Jul/16 12:02
26 kB
Ravi sharma

Activity

People

Assignee:: Unassigned

Reporter:: Ravi sharma

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Jul/16 12:03

Updated:: 22/Nov/17 20:05

Resolved:: 22/Nov/17 20:05