Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8489

Fix encoding of secondary index key

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • 1.0.0
    • None
    • None

    Description

      Secondary index key is a combination of secondaryKey and recordKey. There are two ways to encode with a delimiter ($):

      1. Run base64 encoding: `Base64.encode(secondaryKey) + DELIMITER + Base64.encode(recordKey)`.  Base64 does not map to $. So, this gives us a neat and standard way to encode. Might not be very efficient for long strings? But, base64 is a standard scheme.
      2. Escape special characters:  `escapeSpecialChars(secondaryKey) + DELIMITER + escapeSpecialChars(recordKey)`. The keys are readable and preserves the order. This is a custom scheme not used in other systems.

      Ran a benchmark to compare encoding/decoding time and did not find much difference - https://gist.github.com/codope/b1c73abed748d77c0b4db974d835f9da

      Attachments

        Activity

          People

            codope Sagar Sumit
            codope Sagar Sumit
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: