[IMPALA-4629] Impalad threads stuck spinning when catalog is repeatedly restarted - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: Impala 2.6.0
Fix Version/s: None
Component/s: Catalog
Labels:
None

Target Version:

Product Backlog

Description

ScottChris reported a failure mode on a user forum where a query gets stuck with a spinning thread when their catalog server was hitting a heap memory problem and was being auto-restarted. It seems like we don't gracefully fail the query in this case.

Version: Cloudera Express 5.8.2 (#17 built by jenkins on 20160916-1426 git: d23c620f3a3bbd85d8511d6ebba49beaaab14b75)
 
Parcel Name Version Status Actions
CDH 5 5.8.2-1.cdh5.8.2.p0.3 Distributed, Activated
 
$ uname -a
Linux hostname_redacted 2.6.32-642.6.2.el6.x86_64 #1 SMP Mon Oct 24 10:22:33 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
 
We initially thought we were exceeding impala-shell resources with our insert-select statement moving external csv data to an internal parquet table, however now a simple 'compute incremental stats tablename' has become stuck as well.
 
This is causing us grief in our production environment, and we are having to constantly check port 25000, and manually restart the particular impala damon spinning the cpu. Luckily our insert scripts are fault tolerant and just repeat if fail.  (but once all CPUs are consumed spinning then we are dead in the water)
 
We are not sure but this seems to have started after we upgrade 5.71. to 5.8.2.
 
In the logs immediately after the 'stuck' query is always this error:
 
I1204 03:30:03.958894 7150 Frontend.java:875] analyze query compute incremental stats tablename
I1204 03:30:03.959247 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename
I1204 03:32:03.970648 7150 Frontend.java:894] Missing tables were not received in 120000ms. Load request will be retried.
I1204 03:32:03.970940 7150 Frontend.java:819] Requesting prioritized load of table(s): default.tablename
I1204 03:32:37.981461 7142 jni-util.cc:166] com.cloudera.impala.catalog.CatalogException: Detected catalog service ID change. Aborting updateCatalog()
at com.cloudera.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:120)
at com.cloudera.impala.service.Frontend.updateCatalogCache(Frontend.java:227)
at com.cloudera.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:180)
I1204 03:32:37.983515 7142 status.cc:111] CatalogException: Detected catalog service ID change. Aborting updateCatalog()
@ 0x80f2c9 (unknown)
@ 0xb37c30 (unknown)
@ 0xa4e5cf (unknown)
@ 0xa68ea9 (unknown)
@ 0xb00a02 (unknown)
@ 0xb068f3 (unknown)
@ 0xd2bed8 (unknown)
@ 0xd2b114 (unknown)
@ 0x7dc26c (unknown)
@ 0x1b208bf (unknown)
@ 0x9b0a39 (unknown)
@ 0x9b1492 (unknown)
@ 0xb89327 (unknown)
@ 0xb89c64 (unknown)
@ 0xdee99a (unknown)
@ 0x3f37a07aa1 (unknown)
@ 0x3f376e893d (unknown)
E1204 03:32:37.983541 7142 impala-server.cc:1339] There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog()

http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/impala-shell-operations-getting-stuck-spinning-cpus-100-queries/m-p/48386#M2306?eid=31&aid=1

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 08/Dec/16 16:33

Updated:: 13/Jun/17 04:24

Resolved:: 13/Jun/17 04:24