Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
-
None
-
Reviewed
Description
I was looking at an HBase user's cluster with danilocop where they saw two otherwise identical clusters where one of them was regularly had sockets in CLOSE_WAIT going from RegionServers to a distributed storage appliance.
After a lot of analysis, we eventually figured out that these sockets in CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to close inside of the RegionServer. The subtlety was that only one of these HBase clusters was set up to do replication (to the other cluster). The HBase cluster experiencing this problem was shipping edits to a peer, and had previously been using Phoenix. At some point, the cluster had Phoenix removed from it.
What we found was that replication still had WALs to ship which were for Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; however, this codec class was missing from the RS classpath after the owner of the cluster removed Phoenix.
When we try to instantiate the Codec implementation via ReflectionUtils, we end up throwing an UnsupportedOperationException which wraps a NoClassDefFoundException. However, in WALFactory, we only close the FSDataInputStream when we catch an IOException.
Thus, replication sits in a "fast" loop, trying to ship these edits, each time leaking a new socket because of the InputStream not being closed. There is an obvious workaround for this specific issue, but we should not leak this inside HBase.
Approximate, 2.1.x stack trace which lead us to this is below.
2021-03-11 18:19:20,364 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: Failed to read stream of replication entries java.io.IOException: Cannot get log reader at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366) at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303) at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291) at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174) at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138) Caused by: java.lang.UnsupportedOperationException: Unable to find org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:47) at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:106) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:301) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:311) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:81) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168) at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:321) ... 10 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:43) ... 16 more
Attachments
Issue Links
- links to
1.
|
Port failure to close InputStream to 1.x | Resolved | Josh Elser |