Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.1.0, 2.0.0, 2.2.0, 2.3.0
-
None
-
Reviewed
-
Description
By HBASE-14004, we introduce WALFileLengthProvider interface to keep the current writing wal file length by ourselves, WALEntryStream used by ReplicationSourceWALReader could only read WAL file byte size <= WALFileLengthProvider.getLogFileSizeIfBeingWritten if the WAL file is current been writing on the same RegionServer .
AsyncFSWAL implements WALFileLengthProvider by AbstractFSWAL.getLogFileSizeIfBeingWritten, just as folllows :
public OptionalLong getLogFileSizeIfBeingWritten(Path path) { rollWriterLock.lock(); try { Path currentPath = getOldPath(); if (path.equals(currentPath)) { W writer = this.writer; return writer != null ? OptionalLong.of(writer.getLength()) : OptionalLong.empty(); } else { return OptionalLong.empty(); } } finally { rollWriterLock.unlock(); } }
For AsyncFSWAL, above AsyncFSWAL.writer is AsyncProtobufLogWriter ,and AsyncProtobufLogWriter.getLength is as follows:
public long getLength() { return length.get(); }
But for AsyncProtobufLogWriter, any append method may increase the above AsyncProtobufLogWriter.length, especially for following AsyncFSWAL.append
method just appending the WALEntry to FanOutOneBlockAsyncDFSOutput.buf:
public void append(Entry entry) { int buffered = output.buffered(); try { entry.getKey(). getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build() .writeDelimitedTo(asyncOutputWrapper); } catch (IOException e) { throw new AssertionError("should not happen", e); } try { for (Cell cell : entry.getEdit().getCells()) { cellEncoder.write(cell); } } catch (IOException e) { throw new AssertionError("should not happen", e); } length.addAndGet(output.buffered() - buffered); }
That is to say, AsyncFSWAL.getLogFileSizeIfBeingWritten could not reflect the file length which successfully synced to underlying HDFS, which is not as expected.
Attachments
Issue Links
- is related to
-
HBASE-24691 Fix flaky TestWALEntryStream
- Resolved
- relates to
-
HBASE-14004 [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
- Closed
- links to