Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11234

impalad keeps reporting ShortCircuitCache slot release failures in heavy workload

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 4.1.0
    • Impala 4.2.0
    • Backend
    • None
    • ghx-label-14

    Description

      I keep seeing this error during a local perf test on my desktop machine:

      E0410 07:04:10.691095   430 ShortCircuitCache.java:232] ShortCircuitCache(0x6e76c6a7): failed to release short-circuit shared memory slot Slot(slotIdx=0, shm=DfsClientShm(1effcf56a590fbc371938a368987f4e9)) by sending ReleaseShortCircuitAccessRequestProto to /var/lib/hadoop-hdfs/socket.31001.  Closing shared memory segment.
      Java exception follows:
      java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId 1effcf56a590fbc371938a368987f4e9
              at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:214)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
       

      I can also find it in our Jenkins jobs, but it only happens in the data-loading phase. So I suspend it only happens in heavy workloads.

      HDFS-14701 mentioned that this happens when the DataNode is stopped/restarted. But I didn't restart my HDFS cluster and I'm still able to see this error log.

      It worth investigating if we are doing something wrong in short-circuit related stuffs.

      Attachments

        1. short-circuit-analysis.txt.gz
          3 kB
          Quanlong Huang
        2. hdfs-impalad-logs.tar.gz
          16.85 MB
          Quanlong Huang
        3. hadoop-hdfs-client-3.1.1.7.2.15.0-88-HDFS-16535.jar
          4.98 MB
          Quanlong Huang

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: