Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4629

Impalad threads stuck spinning when catalog is repeatedly restarted

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: None
    • Component/s: Catalog
    • Labels:
      None

      Description

      ScottChris reported a failure mode on a user forum where a query gets stuck with a spinning thread when their catalog server was hitting a heap memory problem and was being auto-restarted. It seems like we don't gracefully fail the query in this case.

      Version: Cloudera Express 5.8.2 (#17 built by jenkins on 20160916-1426 git: d23c620f3a3bbd85d8511d6ebba49beaaab14b75)
       
      Parcel Name Version Status Actions
      CDH 5 5.8.2-1.cdh5.8.2.p0.3 Distributed, Activated
       
      $ uname -a
      Linux hostname_redacted 2.6.32-642.6.2.el6.x86_64 #1 SMP Mon Oct 24 10:22:33 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
       
      We initially thought we were exceeding impala-shell resources with our insert-select statement moving external csv data to an internal parquet table, however now a simple 'compute incremental stats tablename' has become stuck as well.
       
      This is causing us grief in our production environment, and we are having to constantly check port 25000, and manually restart the particular impala damon spinning the cpu. Luckily our insert scripts are fault tolerant and just repeat if fail.  (but once all CPUs are consumed spinning then we are dead in the water)
       
      We are not sure but this seems to have started after we upgrade 5.71. to 5.8.2.
       
      In the logs immediately after the 'stuck' query is always this error:
       
      I1204 03:30:03.958894 7150 Frontend.java:875] analyze query compute incremental stats tablename
      I1204 03:30:03.959247 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename
      I1204 03:32:03.970648 7150 Frontend.java:894] Missing tables were not received in 120000ms. Load request will be retried.
      I1204 03:32:03.970940 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename
      I1204 03:32:37.981461 7142 jni-util.cc:166] com.cloudera.impala.catalog.CatalogException: Detected catalog service ID change. Aborting updateCatalog()
      at com.cloudera.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:120)
      at com.cloudera.impala.service.Frontend.updateCatalogCache(Frontend.java:227)
      at com.cloudera.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:180)
      I1204 03:32:37.983515 7142 status.cc:111] CatalogException: Detected catalog service ID change. Aborting updateCatalog()
      @ 0x80f2c9 (unknown)
      @ 0xb37c30 (unknown)
      @ 0xa4e5cf (unknown)
      @ 0xa68ea9 (unknown)
      @ 0xb00a02 (unknown)
      @ 0xb068f3 (unknown)
      @ 0xd2bed8 (unknown)
      @ 0xd2b114 (unknown)
      @ 0x7dc26c (unknown)
      @ 0x1b208bf (unknown)
      @ 0x9b0a39 (unknown)
      @ 0x9b1492 (unknown)
      @ 0xb89327 (unknown)
      @ 0xb89c64 (unknown)
      @ 0xdee99a (unknown)
      @ 0x3f37a07aa1 (unknown)
      @ 0x3f376e893d (unknown)
      E1204 03:32:37.983541 7142 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog()
      

      http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/impala-shell-operations-getting-stuck-spinning-cpus-100-queries/m-p/48386#M2306?eid=31&aid=1

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: