Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6701

Explore use of UUID-6/7 as a replacement for current auto generated keys

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Information Provided
    • None
    • 1.0.0-beta1
    • None
    • None

    Description

      Today, we auto generate string keys of the form (HoodieRecord#generateSequenceId), which is highly compressible, esp compared to uuidv1, when we store as a string column inside a parquet file.

        public static String generateSequenceId(String instantTime, int partitionId, long recordIndex) {
          return instantTime + "_" + partitionId + "_" + recordIndex;
        }
      

      As a part of this task, we'd love to understand if

      • Can uuid6 or 7, provide similar compressed storage footprint when written as a column in a parquet file.
      • can the current format be represented as a 160-bit number i.e 2 longs, 1 int in storage? would that save us further in storage costs?

      (Orthogonal consideration is the memory needed to hold the key string, which can be higher than a 160bits. We can discuss this later, once we understand storage footprint)

      Resources:

      Attachments

        Activity

          People

            linliu Lin Liu
            vinoth Vinoth Chandar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: