Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3869

Perfromance down in KUDU as compare to HDFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Not A Bug
    • Kudu_Impala
    • None
    • Backend
    • None

    Description

      I am performing testing scenarios between IMPALA on HDFS vs IMPALA on KUDU

      we have set of queries which are accessing number of fact tables and dimension tables.

      In one of the query we are trying to process 2 fact tables which are having around 78 millions and 668 millions records.

      While having data in IMPALA on HDFS, i was able to get query results in less than 50 seconds.

      But while having data in IMPALA on KUDU, even after trying number of distributions/paritions, i have not been able to reduce query execution time less than 125 seconds.

      So i have some conerns here...
      1. In KUDU, what is the criteria of having number of cores/nodes in cluster as per number of records to process...?
      2. In KUDU, is there any option of like distributed cache in IMPALA on KUDU to improve my execution time...?
      3. Is there any other way to improve performance with having such huge data load..?

      i have attached the query for reference..

      Attachments

        1. query3.txt
          26 kB
          Ravi sharma

        Activity

          People

            Unassigned Unassigned
            ravikcse08 Ravi sharma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: