Affects Version/s: 2.2.0
Fix Version/s: None
The change to use JNI to do HDFS writes improved the interface by returning more useful infomration to the caller. In TRAFODION-2946, we ddescribe the need for LOB locking because of a condition where multiple threads writing to the same LOB column could interleave and cause problems. TWith the new JNI interface and HDFS write will now return the offset where the data was written. So we can use this return offset to store in the descriptor tables. Prior to this while using the libhdfs API, we would not get back the "written offset".
So the order of operations before this change used to be :
- Get the EOD for the LOB data file in HDFS
- Store this offset into the LOB descriptor tables so we know where to retrieve the data from during a read.
- call hdfsWrite to write to the LOB data file. And hope that the offset where the hdfsWrite writes is the same as the EOD calculated in 1. hdfs being an "append only"file system, this is usually how it works. But if another process comes in and does an insert into the LOB column between 2 and 3, then we have an incorrect offset stored int he descriptor tables. Hence we added a Lob Lock to make steps 1,2 and 3 atomic as part of Trafodion-2946 to address this issue.
The order of operations with this change is as follows :
- Call JNI hdfs Write API to write the lob data to hdfs.
- Use return data offset from JNI hdfswrite API in 1. as the offset to store in the LOB descriptor tables.
- If there are multiple chunks to write, do it in a loop and append to the first chunk. This way each chunk can be anywhere in hdfs and not necessarily continguous. But we are guaranteed that whatever we wrote will be stored in our interna lLOB descriptor files.
- If any failure or TM error occurs while writing to the LOB descriptor tables, transaction gets rolled back and the chunk of hdfs data written becomes "dead data". It doesn't harm the next operation.
- GC check is now done before an update or insert. Earlier it was done as part of the ::allocateDesc operation to get the EOD of the file.