Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-7339

Fix all sidegrades breaking with UnsupportedOperationException on MissingBlobStore by introducing LoopbackBlobStore

    XMLWordPrintableJSON

    Details

    • Flags:
      Patch, Important

      Description

      Problem

      In some edge cases when the binary under the same path (/content/asset1) is modified by 2 independent checkpoints: A & B the sidegrade without providing DataStore might fail with the following error:

      An exception thrown by oak-upgrade tool
      Caused by: java.lang.UnsupportedOperationException: null
          at org.apache.jackrabbit.oak.upgrade.cli.blob.MissingBlobStore.getInputStream(MissingBlobStore.java:62)
          at org.apache.jackrabbit.oak.plugins.blob.BlobStoreBlob.getNewStream(BlobStoreBlob.java:47)
          at org.apache.jackrabbit.oak.plugins.segment.SegmentBlob.getNewStream(SegmentBlob.java:276)
          at org.apache.jackrabbit.oak.plugins.segment.SegmentBlob.getNewStream(SegmentBlob.java:86)
          at org.apache.jackrabbit.oak.plugins.memory.AbstractBlob$1.openStream(AbstractBlob.java:44)
          at com.google.common.io.ByteSource.contentEquals(ByteSource.java:344)
          at org.apache.jackrabbit.oak.plugins.memory.AbstractBlob.equal(AbstractBlob.java:67)
          at org.apache.jackrabbit.oak.plugins.segment.SegmentBlob.equals(SegmentBlob.java:227)
          at com.google.common.base.Objects.equal(Objects.java:60)
          at org.apache.jackrabbit.oak.plugins.memory.AbstractPropertyState.equal(AbstractPropertyState.java:59)
          at org.apache.jackrabbit.oak.plugins.segment.SegmentPropertyState.equals(SegmentPropertyState.java:242)
          at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareProperties(SegmentNodeState.java:617)
          at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:511)
      (the same nested methods)
          at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:604)
          at org.apache.jackrabbit.oak.upgrade.PersistingDiff.diff(PersistingDiff.java:139)
          at org.apache.jackrabbit.oak.upgrade.PersistingDiff.childNodeChanged(PersistingDiff.java:191)
          at org.apache.jackrabbit.oak.plugins.segment.MapRecord$3.childNodeChanged(MapRecord.java:440)
          at org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:483)
          at org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:432)
          at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:604)
          at org.apache.jackrabbit.oak.upgrade.PersistingDiff.diff(PersistingDiff.java:139)
          at org.apache.jackrabbit.oak.upgrade.PersistingDiff.applyDiffOnNodeState(PersistingDiff.java:106)
          at org.apache.jackrabbit.oak.upgrade.RepositorySidegrade.copyDiffToTarget(RepositorySidegrade.java:403)
          at org.apache.jackrabbit.oak.upgrade.RepositorySidegrade.migrateWithCheckpoints(RepositorySidegrade.java:347)
      

       

      Abstract of proposed solution

      The idea for migration is simple: instead of failing on:

      public InputStream getInputStream(String blobId) throws IOException;
      

      or

      public int readBlob(String blobId, long pos, byte[] buff, int off, int length) throws IOException;
      

      lets introduce a BlobStore implementation that acts similarly as a localhost interface that what you sent it will resend back to this interface.

      How it works

      It works as a localhost interface, the same way: when blobId is requested... then blobId is served as a binary content instead of throwing: UnsupportedOperationException.

      This allows to act quickly on migrations that requires to compare binaries in order to satisfy requirements for checkpoints to be rewritten, copied from scratch.

      Pros

      • simplifies simple sidegrade migration use cases: you do not need anymore to include your DataStore (which slows that migration not necessary) on the command line when the migration is failing
      • speeds up the migration as it doesn't require to reference BlobStore implementation in cases where binary references are only copied together with NodeStore inlined binaries
      • you're always copying checkpoints which means (no need anymore for --skip-checkpoints option) that no full re-indexes are happening anymore after migration on migrated repository

      Cons and risks

      • Low risk: not visible effect to user that for a specific migration a DataStore is needed (whether you are running into this specific edge case)
      • Medium risk: NodeStore storage overhead for checkpoints if compared binaries across checkpoints have different blob IDs (in example different algorithms SHA256 vs SHA512). This we'll lead in comparison to not equal evaluation and the node will be rewritten for the checkpoint.
      • Low risk: Currently we're accepting in readBlob requests to copy smaller binaries whilst the caller expects higher length of binary (it might be the case if getBlobLength is not used for some reasons apriori to the readBlob and the original length of the binary (that is really placed in real DataStore) is kept somewhere in cache. The API here informs anyway how many bytes were read and it recommends to caller to check for that value.

       

        Attachments

        1. OAK-7339-jenkins-xml-encoding-issue.patch
          5 kB
          Arek Kita
        2. OAK-7339.patch
          25 kB
          Arek Kita

          Activity

            People

            • Assignee:
              tomek.rekawek Tomek Rękawek
              Reporter:
              arkadius Arek Kita
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: