Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6500

Impala crashes under certain hypervisors that return out-of-range CPU IDs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.10.0
    • Fix Version/s: Impala 3.0, Impala 2.12.0
    • Component/s: Backend
    • Labels:
    • Environment:
      RHEL 6.5 (Santiago), Kernel version: 2.6.32-431.el6.x86_64, CDH 5.13.1 single node, Impala 2.10, VMware ESXi, v5.0.0

      Description

      I have a Parquet table created by Hive and I am doing multiple different queries on it, such as:

      SELECT product_category,
      SUM(cast(profit AS DECIMAL(15,2))) as total_profit,
      SUM(cast(sales AS DECIMAL(15,2))) as total_sales
      FROM copy_orders
      GROUP BY product_category;

      and:

      SELECT customer_name,
      SUM(cast(profit AS DECIMAL(15,2))) as total_profit,
      SUM(cast(sales AS DECIMAL(15,2))) as total_sales
      FROM copy_orders
      GROUP BY customer_name
      ORDER BY total_profit DESC
      LIMIT 10;

      These two queries tend to run successfully in some rare occasions, most of the time running those queries on HUE's Impala query editor will return:

      Could not connect to hostname:21050 (code THRIFTTRANSPORT): TTransportException('Could not connect to hostname:21050',)

      Simultaneously, the Impala Daemon crashes according to the Cloudera Manager and then it will work again approximately 1 min later. Meanwhile, You can run other simple queries and it will run successfully.

      I have attached a log file for a sample run of one of the queries since they all generate relevant logs. I have tried to use SET disable_codegen=1 but the problem resumed.

      I have added both the impalad.ERROR and impalad.INFO files after running the 2nd query 2 times: the 1st time I used "SET disable_codegen=1" which sometimes work, and after that, it ran successfully (represented by impalad.INFO). The other time, at which the query has failed, is logged in the (impalad.INFO2) while impalad.ERROR doesn't seem to have changed at all. It seems that all the old logs gets removed from the main logging files (impalad.INFO and impalad.ERROR) since running the query keeps restarting the Impala Daemon.

        Attachments

        1. 000000_0
          703 kB
          Osama Suleiman
        2. 7b977512-5aa4-4f40-9ae47184-35853042.dmp
          2.57 MB
          Osama Suleiman
        3. 7b977512-5aa4-4f40-9ae47184-35853042.txt
          885 kB
          Lars Volker
        4. hs_err_pid9910.log
          101 kB
          Osama Suleiman
        5. impalad.ERROR
          0.2 kB
          Osama Suleiman
        6. impalad.INFO
          63 kB
          Osama Suleiman
        7. impalad.INFO2
          38 kB
          Osama Suleiman
        8. query_result.txt
          0.9 kB
          Osama Suleiman
        9. shared_libs.txt
          1 kB
          Lars Volker

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              osuleiman Osama Suleiman
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: