[IMPALA-11234] impalad keeps reporting ShortCircuitCache slot release failures in heavy workload - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 4.1.0
Fix Version/s: Impala 4.2.0
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-14

Description

I keep seeing this error during a local perf test on my desktop machine:

E0410 07:04:10.691095   430 ShortCircuitCache.java:232] ShortCircuitCache(0x6e76c6a7): failed to release short-circuit shared memory slot Slot(slotIdx=0, shm=DfsClientShm(1effcf56a590fbc371938a368987f4e9)) by sending ReleaseShortCircuitAccessRequestProto to /var/lib/hadoop-hdfs/socket.31001.  Closing shared memory segment.
Java exception follows:
java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId 1effcf56a590fbc371938a368987f4e9
        at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:214)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

I can also find it in our Jenkins jobs, but it only happens in the data-loading phase. So I suspend it only happens in heavy workloads.

~~HDFS-14701~~ mentioned that this happens when the DataNode is stopped/restarted. But I didn't restart my HDFS cluster and I'm still able to see this error log.

It worth investigating if we are doing something wrong in short-circuit related stuffs.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

short-circuit-analysis.txt.gz
11/Apr/22 02:26
3 kB
Quanlong Huang
hdfs-impalad-logs.tar.gz
11/Apr/22 02:31
16.85 MB
Quanlong Huang
hadoop-hdfs-client-3.1.1.7.2.15.0-88-HDFS-16535.jar
06/May/22 01:12
4.98 MB
Quanlong Huang

Issue Links

is caused by

HDFS-13639 SlotReleaser is not fast enough

Resolved

requires

HDFS-16535 SlotReleaser should reuse the domain socket based on socket paths

Resolved

Activity

People

Assignee:: Quanlong Huang

Reporter:: Quanlong Huang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Apr/22 00:03

Updated:: 09/May/22 00:12

Resolved:: 06/May/22 01:13