Uploaded image for project: 'Apache Trafodion'
  1. Apache Trafodion
  2. TRAFODION-3263

Disable LOB locking and refactor order of LOB iud expression evaluation

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.2.0
    • Fix Version/s: None
    • Component/s: sql-general
    • Labels:
      None

      Description

       The change to use JNI to do HDFS writes improved the interface by returning more useful infomration to the caller. In TRAFODION-2946, we ddescribe the need for LOB locking because of a condition where multiple threads writing to the same LOB column could interleave and cause  problems. TWith the new JNI interface and HDFS write will now return the offset where the data was written. So we can use this return offset to store in the descriptor tables. Prior to this while using the libhdfs API, we would not get back the "written offset".

       

      So the order of operations before this change  used to be :

      1. Get the EOD for the LOB data file in HDFS
      2. Store this offset into the LOB descriptor tables so we know where to retrieve the data from during a read. 
      3. call hdfsWrite to write to the LOB data file. And hope that the offset where the hdfsWrite writes is the same as the EOD calculated in 1. hdfs being an "append only"file system, this is usually how it works. But if another process comes in and does an insert into the LOB column between 2 and 3, then we have an incorrect offset stored int he descriptor tables. Hence we added a Lob Lock to make steps 1,2 and 3 atomic as part of Trafodion-2946 to address this issue.

      The order of operations with this change is as follows :

      1. Call JNI hdfs Write API to write the lob data to hdfs. 
      2. Use return data offset from JNI hdfswrite API in 1. as the offset to store in the LOB descriptor tables. 
      3. If there are multiple chunks to write, do it in a loop and append to the first chunk. This way each chunk can be anywhere in hdfs and not necessarily continguous. But we are guaranteed that whatever we wrote will be stored in our interna lLOB descriptor files.
      4. If any failure or TM error occurs while writing to the LOB descriptor tables,  transaction gets rolled back and the chunk of hdfs data written becomes "dead data". It doesn't harm the next operation. 
      5. GC check is now done before an update or insert. Earlier it was done as part of the ::allocateDesc operation to get the EOD of the file. 

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sandhya Sandhya Sundaresan
                Reporter:
                sandhya Sandhya Sundaresan
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 10m
                  4h 10m